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A CONTRIBUTION TO THE THEORY OF SELF-RENEWING 
AGGREGATES, WITH SPECIAL REFERENCE TO 
INDUSTRIAL REPLACEMENT 


By Aurrep J. LorKa 


1. Introduction. The analysis of problems of industrial replacement forms 
part of the more general analysis of problems presented by “self-renewing 
ageregates.”” While the subject could, therefore, be treated in general and 
consequently rather abstract terms, for the purpose of exposition it will be 
advantageous to relate the discussion to concrete applications. These, in the 
past, have been mainly of two kinds, namely, first, applications to population 
analysis with related problems in genetics on the one hand and actuarial prob- 
lems on the other; and second, applications to industrial replacement. As the 
fundamental setting of the two types of problems is very similar, leading in 
each case to certain integral equations, it will be advantageous to consider 
together both problems, or both phases of the general problem. This will 
incidentally give us an opportunity to observe the analogy, but also certain 
points of difference, between the two aspects of the problem. 

Historically, the investigation of an actuarial problem came first. L. Her- 
belot” (1909) examined the number of annual accessions required to maintain a 
body of N policyholders constant, as members drop out by death. He assumes 
an initial body of N ‘‘charter’’ members at time t = 0, all of the same age, which 
for simplicity may be called age zero, since this merely amounts to fixing an 
arbitrary origin of the age scale. He further assumes the same uniform age at 
entry for each “new’’ member. 

Then, if p(t) is the probability at the age of entry of surviving ¢ years, the 
survivors of charter members at time ¢ will number Np(t); and if f(7) is the 
rate per head at which members drop out by death at time 7, being then imme- 
diately replaced by a new member of the fixed age of entry, then the survivors 
at time ¢ of “new’’ members will evidently be given by 


N [ sev —r)dr 


1] use here an English equivalent, as nearly as possible, to the German phrase “‘sich 
erneuernde Gesamtheiten,’’ used by Swiss actuaries. 

2 Herbelot’s original paper is disfigured with a number of misprints. It is essentially 
reproduced, with the errors corrected, in a paper by R. Risser (1912). The same treatment 
of the problem is also given by Zwinggi (1931) and by Schulthess (1935), (1937). 
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2 ALFRED J. LOTKA 


Hence, the condition for a constant membership N is 


(1) Np(t) +N | “f()plt — r)dr = N 

or 

(2) p(t) + [ serve —r)dr=1 

Differentiating with regard to t, and remembering that p(0) = 1, we have - 
(3) p’(t) + [ sor — r)dr + f(t) =0 


Equation (3) may be written 


(4) f®) = —p'® - [ sor —1)dr 


or, putting (¢ — r) =a 
re 
(5) f@) = -p'® — | fe —a)p'(@)da. 

For the solution of the integral equation thus obtained Herbelot uses the 
method of successive differentiations,’ duly pointing out its limitations, and 
applying it to several specific expressions for the survival function p(a). 

There is nothing in Herbelot’s treatment to limit its application to living 
organisms. It is directly applicable to the problem of industrial replacement 
of an equipment comprising N original units installed at time t = 0, and main- 
tained constant by the replacement of disused units with new. 

Next in chronological order, of publications dealing with the type of problem 
with which we are here concerned, is a paper by Sharpe and Lotka (1911), who 
use Hertz’s form of solution for the integral equation involved.* To this I wish 


3 This method is also followed in dealing with the problem of renewal by Risser (1912), 
(1920); Zwinggi (1931); Schulthess (1935), (1937); Preinreich (1938). All these authors 
applied their reflections to arbitrarily assumed frequency distributions for the renewal 
function, of simple analytical form. For example, among the more recent applications is 


; ‘\ ; i 
one by Schulthess, who uses the function p(t) = {| 1 — — } ; and quite recently, Preinreich 
° Ww 


has suggested the use of a Type I Pearson frequency curve on the basis of Kurtz’s observa- 
tional data. It is to be noted, however, that when it comes to actual application, Prein- 
reich does not use an ordinary Pearson Type I curve nor actual observational data of any 
kind, but very conveniently simplifies the Pearson formula by giving integral values, 
namely 1 and 2, to the exponents, thereby reducing to triviality the task of applying the 
method of differentiation. None of these authors makes any attempt to deal with actual 
numerical observations which, in practice, fall far wide of any of the simple analytical 
formulae employed by them. 
4 P. Hertz, Mathematische Annalen, 1908, vol. 65, pp. 84 to 86. 
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THEORY OF SELF-RENEWING AGGREGATES 3 


to refer in some detail, adding to the original exposition in the light of later 
developments. The treatment of the subject proceeds here along somewhat 
broader lines, but, with obvious changes in the meaning of the symbols, and 
with certain modifications and limitations which are themselves of interest, 
the development is immediately applicable to economic systems composed of 
units having a characteristic “mortality” in use. 

A population of living organisms, unlike industrial equipment, has practically 
no beginning. We know its existence only as a continuing process. Accord- 
ingly the equation for its development is most naturally framed without explicit 
reference to any ‘‘charter members.” 

The basis of the analysis is as follows: 

In a population growing solely by excess of births over deaths (i.e. in the 
absence of immigration and emigration), the annual female births B(t) at time t 
are the daughters of mothers a years old, born at time (tf — a) when the annual 
female births were B(t — a). If fertility and mortality are constant and such 
that a fraction p(a) of all births survive to age a, and are then reproducing at 
an average rate m(a) daughters per head per annum, then, evidently,” 


(6) Bit) 


[ B(t — a)p(a)m(a) da 


(7) | B(t — a)g(a) da. 
* 0 

This is the fundamental equation in its original form, and, as noted above, 
it does not explicitly refer to any initial state, though, as will be seen presently, 
in order to make the problem determinate, data regarding the system at some 
particular period must be given. For the present we note that (7) can be 
written 


(8) Bit) = [ B(t — a)y(a) da + [ B(t — a)g(a) da 


(9) B() = Bt) + I " B(t — aoa) da. 


It is to be noted that the right hand member of (8), splits the total births B(t) 
into two sections, those in which (t — a) < 0, that is, births of daughters whose 
mothers were born before t = 0; and those for which (¢ — a) > 0, that is births 
of daughters whose mothers were both after t = 0. The former section is 
denoted by B,(t) in (9). The function B,(¢) thus defined will be found, in the 


5 Here and elsewhere in these developments the limits of the integral have, for simplicity, 
been written 0 and ». This ensures the inclusion of all nonvanishing terms in the inte- 
grand; the inclusion of terms for which either g(a) or B(t — a) vanishes does not, of course, 
affect the value of the integral. If ¢(a) is represented between the limits a, w of the repro- 
ductive period by some analytical expression, such as a Pearson frequency function, it is, of 
course, understood that outside the range a, w we must put ¢(a) = 0. 
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further development, to play a significant réle. Here it will suffice to point out 
that it vanishes for all values of ¢ greater than w, the upper limit of the repro- 
ductive period, because g(a) vanishes for these values of a. 


2. Special case. A case of special interest is that in which B,(t) represents 
the births of daughters whose mothers were all born in an interval of time 
t = —dttot = 0. In that case the first integral in (8) reduces to a single term, 
so that 


(10) Bit) 


BOO) g(t) dt + i 0 ~ cheba 


or, putting 


(11) B(O) dt = No 


(12) B(t) 


Nog(t) + [ B(t — a)g(a) da. 


This last equation holds also if a finite number of births take place (or are 
regarded as taking place) at a point of time t = 0. 

Equations (10) and (12) are of interest as basic for the examination of the 
progeny of an infinitesimal population element,° that is, of a “zero” generation, 
born at time zero. In that case B,(¢t) is the annual rate of births in the “first” 
generation, and is simply proportional to ¢(t), i.e. 


(13) Bi(t) = Nog(t) 


For the sake of greater generality the development has so far been given in 
terms of the phenomenon of replacement (reproduction) as it presents itself 
in a population of living organisms. But it should be noted here that, with 
appropriate changes in the meaning of g(a), equation (12) is directly applicable 
to the problem of industrial renewal in an installation originally installed at 
some point of time and maintained at a constant level by the replacement of 
each unit by a new one, the moment it is disused. In that case the ‘‘rate per 
head of reproduction” m(a) at age a is evidently the same thing as the “death 
rate per head” at age a, namely 


“0 0) = i = 88 
so that 

(15) g(a) = p(a)u(a) 
becomes 

(16) g(a) = —p’(a). 


6A. J. Lotka, (1928), (1929). 


: 
| 
; 
| 


rene 





| 
| 
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Reverting now to the fundamental equation in its first form (6), a trial 
substitution 


(17) B(t) = Qe" 


is found to satisfy this equation, provided that r is a root of the characteristic 
equation 


(18) [ e “g(a)da = 1 


We may speak of (17) as a particular solution of (6) or (7). It is easily seen 
that the sum of such particular solutions is also a solution, i.e. 
(19) B(t) = Que™ + Que”! + «-- 
where 71, 72 etc., are roots of the characteristic equation (18).’ 
For real values of r the function 


(20) F(r) = [ e oa) da 


decreases monotonically as r increases, since, from its nature, g(a) > 0 for all 
values of a. Hence (18) can have only one real root r; , and we shall have 


(21) ry 2l0 according as | y(a) da = & 
If u + iv is a complex root of (18) then 
(22) 1 = I e ““ cos va g(a) da 
0 
(23) 0 = | e “* sin va g(a) da 
0 


and it is evident from (22) that wu < 7, since cos (va) S 1 for all values of a. 
The real part of any complex root of (18) is, therefore, algebraically less than 
the real root r: . 

This reasoning? is evidently quite independent of the particular form of ¢(a), 
and is thus equally true, whether g(a) be given in purely empirical form (defined 
by a table of values), or as a standard form of frequency curve, such as for 
example a Pearson curve of suitable type. 

The roots of (18) can be determined directly, though rather laboriously, from 





7 For a discussion of the convergence of the series (19) see G. Herglotz, Mathem. Annalen, 
1908, vol. 65, pp. 87 et seq. 

8 Adapted from P. Hertz, Math. Annalen, 1908, vol. 65, pp. 1-86; G. Herglotz, ibid. pp. 
87-106. The Hertz solution is also applied to a similar problem by J. B. 8. Haldane, Proc. 
Cambridge Phil. Soc., 1926, vol. 23, p. 607. A particularly detailed development is given by 
H. T. J. Norton, Proc. London Math. Soc., 1926, vol. 28, p. 21. 
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equations (22) and (23); or, they can be brought into relation with the Thiele 
semivariants u of the function g(a) defined by 


. 1 
(24) F(r) = | e ““o(a) da = moe Uitte" 
0 


where m, is the nth moment of ¢(a) and the seminvariants » can be computed 
from the moments by the algorithm 


- = 1M 
Me = mim, bM2™Mo 
(25) } 
c = um, + Quem + psMo 
Ms = wiMs + 3yome + 3y37 + wgmMo 


ete. 


In terms of these seminvariants the characteristic equation (18) becomes 


(26) mail + --+ — log. m = log. 1 = 2rni 


r 
— mai 


where n takes on all positive and negative integral values. Separating the real 
and imaginary parts in (26), and retaining seminvariants up to the fourth, 


yv(u, v) = a (u* — 6u?v” + v*) os = u(u? — 30°) 
(27) , 
+ a(t —v’) — mu + log. m = 0 


(28) x(u, v) = uv(u? — v°) + 300" — 3u’) + pouv — mv = Qrn. 
If g(a) does not differ too widely from the normal (Gaussian) distribution, so 


that seminvariants of higher than second order can be neglected for roots in the 
neighborhood of u = 0, v = 0, we shall have, approximately’ 


(29) = (u? — v*) — wu + log. mo = 0 
(30) (« s Ht) oe 
Me Me 


® The relations which follow hold exactly if g(a) is actually a normal curve. It should be 
noted, however, that this can not be strictly the case, since the infinite tail of the curve on 
the negative side would imply replacement or reproduction antedating the original installa- 
tion or zero generation. Nevertheless, a normal frequency curve will be admissible if the 
part of the curve extending into the negative age field is negligible. For a concrete example 
(electric light bulbs) see E. J. Gumbel, ‘‘Die Verteilung der Gestorbenen um das Normal- 
alter,’”’ Aktuarske Vedy (Praze), 1933, p. 90. 
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or, putting 


(31) (u - _ =U 
Me 


we have 

(32) Ut = 3] _ 2 loge mo 
Me Me 

(33) Uy = 2th 


Me 
It is thus seen that in these circumstances the roots u, v correspond to the 


points of intersection of the hyperbola (32) centered at u = i, v = 0, with 


He 
a family of hyperbolas (33) concentric with (32), but with their axes at 45° to 
those of (32). 
The intersections of the hyperbolas (33) with the axis of v are given by 
putting uw = 0 in (80), namely 


(30a) so 
Mi 
This also gives, approximately, the frequency of the oscillatory components for 


which wu is sufficiently small. In particular, for the first component, we have, 
in that case 


(30b) y= 7 
Mi 

so that its wave length is (approximately) 4, , the mean of the ¢(a) curve. 

These facts are illustrated in Fig. 1, drawn to scale according to the vital 
statistics of the United States, 1920, for which the requisite computations were 
available from prior publications (Lotka, (1928), (1929)). The diagram is 
drawn in full, showing four intersections of each hyperbola of the family (33). 
Actually values of v occur in pairs, corresponding to conjugate roots u + iv. 
The intersections in the two upper quadrants must be disregarded, as they do not 
correspond to roots of (18). 

To simplify notation let us write (32), (33) in the form 


(32a) U-v =K 

(33a) Uo = C. 
Solving for U’, v’ we find 

(34) U* = 3{K + ~/K? + 4C?} 


(35) v = 3{-K + VK? +4} 
from which, incidentally, it is seen that 


(36) U+u = VK? + 472 


| 
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and hence, that the intersections of the hyperbola (32) with (33) lie on circles 
of radius 


(37) R = VK? + 4C?. 


Sarees oe 
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When the third and fourth moments (and therefore third and fourth semin- 
variants) are taken into account” the hyperbolas become distorted into new 
curves, though the general topographic features of the diagram tend to be 
preserved. In particular, the property of orthogonality of intersection of the 
curves (32) with (33) is preserved, in accordance with a well-known property of 
conjugate functions.” This is shown in the left hand panel of Fig. 2, drawn 
for the same data as Fig. 1, but including not only the hyperbolic curves, but 
also the corresponding modified curves obtained by retaining the third and 


fourth seminvariants in the computation.” Only the quadrant relevant to the 
location of the roots is shown. 


3. The coefficients Q in the solution (19). These are determined by initial 
conditions, being, in fact related to the function B,(t). As their determination 
in the original paper by Hertz and Herglotz is rather complicated, the following 
relatively simple method, resembling that by which the constants in a Fourier 
series are determined, is of interest: 


Multiplying equation (9) by e*‘, where r; is a root of (18), transposing 
terms, and integrating between the limits 0 and w, where w is the highest age for 
which g(a) has a value other than zero, we have 


(38) [ . e "** Bi(t) dt = [ . SB) — | B(t — a)g(a) da 


Introducing the solution (19) in the right hand member of (38), we obtain 
w @ t 
(39) | e "* B(t) dt = ra | ole — | ei 3(a) aa\ at 
0 0 / 


(40) = > Pi; (j = 1, 2, 3, ---). 


Consider now a particular term P;; in the sum : Multiplying out the expo- 
nentials we obtain 


(41) P;; = Q; [ coy = [ e ** oa) ia} a 


which, in view of the characteristic equation (18) reduces to 


(42) P; = a [ pers e "i* (a) dadt 
0 t 
(43) = a [ cota) [ e *-"* dt da. 
0 0 
Hence, if 1 ¥ j 
(44) Pi; = in | e "i* 9(a){e "7" — 1} da 
’; — TT; Jo 


10 Which is as far as curve fitting by Pearson’s method goes. 

11 See, for example, W. E. Byerly, /ntegral Calculus, 1888, p. 289. 

12 For a given value of u equation (27) is a biquadratic in v, and equation (28) is a cubic 
in v lacking the second degree term. The computation of the curves is in consequence 
relatively simple. 
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(45) in aos if e "** o(a) da — | e "i" (a) aah 

’; — 1; 0 0 
(46) = @ 


since r; and r; are both roots of (18). But if 7 = 7, then (44) is of the indeter- 
minate form 0/0 and we must refer back to equation (43), from which, with 
i = j, we obtain, instead of (44) a different expression, namely 


(47) P, = a | eta) [ dt da 
0 0 


(48) = Q [a6 ofa) da 


so that the only term in the sum Zz. in equation (40) that does not vanish is the 
term P;; and finally 


(49) i il ages ani 


I ae "*“¢(a) da 
0 


r e "Bx (t) dt 
(50) si Al seiteaiescomee 


[ ae "*“p(a) da 
0 


[ « {BO _ [ Bit — a)g(a) ia} dt 
(51) a 0 0 


w 


ae "*“o(a) da 


or, finally, in view of (20) 


[ eo (Bq) on [ B(t — a)g(a) ia} dt 
= _— —{F (yer 


The coefficients Q are thus fully determined by (50) or its equivalents (51) 
or (52), when initial conditions are given, that is, when the function B,(¢) is 
given for 0 < t < w or, what amounts to the same thing, when B(t) is known 
for this range of values of ¢. For complex roots the denominator in (52) 
becomes,” in view of (27), (28) 


dF(r) __ _ fay ox a ee 
(53) a om {oh + 1 = G— iH 








13 Since 7; is a root of F(r) = 1, we have 


ao] _[aF@)] _ [alone FO) 
dr rer i - F(r) dr rer ; 7 dr reer; 
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where G and H can be expressed in terms of the seminvariants by partial 
differentiation of (27), (28) with regard to u, namely 


(54) G =m — wut . (u? — v") — a (u® — 3u") + --- 


(55) AH = pov — pauv + - (3u7v — »*) — 


In the special case that the ‘‘zero generation” is composed of No individuals 
(or “‘units’’) all born (or ‘‘entering’’) at time zero, the coefficients Q are corre- 
spondingly simplified in form. For the term in the real root r we have 


No 
56 =. i 
(56) +] ) 
Conjugate complex root terms unite in pairs,” giving 
T wt 
(57) geor’ + Qe = a Hi {G cos vt — H sin vt}. 
72 2 


Unless g(a) is a normal distribution, the computation of the roots, u, v, and 
the coefficients G, H, in terms of seminvariants becomes impracticable for higher 
order roots, which then have to be computed directly and laborously from equa- 
tions (22), (23). In practice components of very high order will hardly be 
needed, nor will their use be warranted, since the high order seminvariants, 
which are then involved, are not usually known with sufficient accuracy. An 
exception occurs when the g(a) curve is essentially of the nature of a composite 
curve. This is what actually happened in the case of the curve of reproduction 
for a human population. For details on this point the reader must be referred 
to my paper “The Progeny of a Population Element”’. 


4. Alternative Representation of the Function B(t). By the application of 
the Hertz-Herglotz solution of the integral equation (6), the evolution of a 
population or aggregate is represented as the resultant of a series of damped 
oscillations. 

Additional insight into the nature of the renewal process is gained by viewing 


. ° ° ° 15 
the total renewals as composed of contributions from successive “generations’’. 





14 For details see A. J. Lotka, The Progeny of a Population Element, p. 892. 

15 In the case of a population the term “‘generation”’ calls for no explanation: mother, 
daughter and granddaughter, for example, represent three generations; in the case of 
industrial replacement, the term is to be understood in this sense, that the original installa- 
tion constitutes the original or zero generation, the units introduced to replace disused 
units of the zero generation constitute the ‘‘first’’ generation, renewal of these the second, 
and so on. 

This explanation may seem unnecessary. However, from some correspondence received 
by the writer it seems that perhaps some readers have confused the generations thus defined 
with successive ‘‘cycles’’ of duration equal to the extreme ‘“‘length of life’’ of the units. 
With such ‘‘cycles’”’ we are not here concerned. 
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This leads to an alternative representation, in which the evolution of the 
aggregate appears as the sum of a series of frequency curves, each corresponding 
to the contribution of one generation to the total births or replacements at 
time t.”° 

In order to realize this second representation we note, first of all, equation (7) 
applies not only to the total births at time t, but, with slight modification, also 
to the births in any particular generation. Here it will be convenient to con- 
sider the special case of a zero generation of No individuals (or units) all born 
(or installed) at time ¢t = 0. 

The births (or renewals) in the “first”? generation, that is offspring of the 
zero generation, or renewals of disused units of the zero generation, will be 
distributed in time according to the equation 


(58) Bit) = Nog(t). 
For the second generation, or renewals of disused units of the first generation, 


we shall have 


(59) Bilt) = I ‘ie ~ Oteide 





16 This alternative approach of the problem bears some superficial resemblance to a 
method followed by R. Frisch in his article ‘““Sammenhengen mellem primaerinvesteringen 
og reinvestering’’ (Statsekonomisk Tidskrift, 1927, p. 117). Frisch also follows up the 
distribution in time of first, second, and higher order replacements, and gives diagrams 
bearing a superficial resemblance to Fig. 4 in the present text. But Frisch’s development 
has otherwise little in common with that here presented. He deals with equipment com- 
posed of various units, with expectation of life varying discontinuously or continuously 
from one unit to another, but fixed at a single value for a given unit. To use one of his own 
examples, it is as if a wooden hammer with a life of one year were always replaced by 
another wooden hammer, also with a life of one year, and so on: while a steel hammer, 
with a life of three years, were always replaced by another steel hammer, also with a life of 
exactly three years. The analogous case in population analysis would be presented by a 
population in which length of life were strictly hereditary, so that a man dying at age 50 
would have a son, grandson, etc., each dying at age 50. In the field of industrial replace- 
ment and in population analysis alike this is a highly unrealistic supposition. 

Needless to say, with these basic assumptions, Frisch’s resulting equations differ funda- 
mentally from those here given, and the distribution curves for successive orders of replace- 
ments, as shown in Frisch’s Fig. 3 do not have the property that the j-th seminvariant of the 
k-th order replacement curve is k times that of the j-th seminvariant of the first order 
curve, except for j=1. The fact is that Frisch’s curves in his Fig. 3 are all similar, except 
for a constant factor applied to the vertical scale and its reciprocal applied to the horizontal 
scale. In this case all the corresponding seminvariants, except the first, are evidently 
unchanged in passing from one curve to the next. Frisch, as a matter of fact, does not 
introduce seminvariants into his discussion at all. The Hertz solution he could not pos- 
sibly introduce, since his fundamental equations are not of a form appropriate for the use of 
the Hertz solution. 

The later sections of Frisch’s paper deal with somewhat more complicated cases, but they 
all involve the assumption of ‘‘strict heredity,’’ that is, the assumption that a unit with 
length of life v is replaced by another having exactly the same length of lifev. At any rate, 
that is the understanding I have formed of the Danish text, studied with the assistance of a 
native of Scandinavia. All the formulae in the text bear out this understanding. 
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and, generally, for the (j + 1)th generation” 


(60) Bult) = I ‘= ae. 


Now, by a well-known property” of the Thiele seminvariants, it follows from 
(58), (59), (60), that the seminvariants of the distribution-in-time of the births 
(or replacements) in the jth generation are simply the j-tuple of the corre- 
sponding seminvariants of the first generation, that is, of g(t). 

Furthermore, it is easily shown that as 7, the order of generation, increases, 
the distribution of renewals approaches” the normal (Gaussian) frequency 
distribution. 

By virtue of these properties the distribution curves for successive genera- 
tions are easily constructed.” 

The sum total of the contributions of successive generations should, of course, 
agree with the expression for the total annual births B(¢) at time ¢ given by the 
fundamental equation (9). In point of fact, by summing the left and the right 
hand members of equations (58), (59), and (60) for all generations up to the 
highest, say the n-th, ‘‘reproducing”’ at time ¢, we find 


j=nt+l1 t j=n 


(61) B(t) = » Bit) = Bit) + | 2, Bit — a)¢(a) da. 


Since the n-th is the highest generation contributing,” the value of the integral 
in (61) is not changed by writing n instead of n + 1 as the upper limit of the 
summation sign on the right. But then (61) becomes simply 


Bi) = By) + [  ~ aeedae 


17 The births in the j-th generation extend at most from ¢ = jato t = jw, but it is not 
necessary to take this into account in writing the limits of the integrals in (60) and corre- 
sponding equations, because the inclusion or exclusion of vanishing terms in the integrand 
does not affect the value of the integral. Similar remarks apply to the effect of the limited 
range of g(a). See also footnote 5. 

18 For details, see A. J. Lotka, ‘‘The Progeny of a Population Element,’’ American 
Journal of Hygiene, 1928, vol. 8, p. 875; also ‘“The Spread of Generations’? Human Biology, 
1929, vol. 1, p. 305. 

19 In practice quite rapidly, even if g(a) is far from normal. 

20 For the case in which g(a) isa Pearson Type I curve, details of the process are given in 
my paper ‘Industrial Replacement,’ Skandinavisk Aktuarietidskrift, 1933, p. 51. I may 
here remark that such a Pearson Type I curve for the distribution in the first generation 
does not strictly give again a Pearson Type I curve in the second generation, because the 
moments beyond the 4th are neglected in fitting such acurve. But it must be remembered 
that the same neglect is practiced in the original fit of the data, so that the fit in the second 
generation will in general be as adequate as that in the first, provided, of course, that 
proper attention is paid to Pearson’s criteria. 

21 The special case that the limiting n so defined is ~ would require special discussion, 
which, however, presents no great difficulty. As this case is of little if any practical im- 
portance, this discussion is here omitted. 
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that is, summation of the contributions of individual generations to the total 


annual births, leads us back to the fundamental equation (9), which confirms 
the correctness of our analysis. 


TABLE I 
Age Schedule of Survivorship and of Replacements* in First Generation 


Survivors from Original 


. ced Replacements Within 
Age Interval — io re of Specified Age Interval 
0-1 100 ,000 — 
1-2 100 ,000 — 
2-3 100 ,000 300 
3-4 99 ,700 | 900 
4-5 98 , 800 1,800 
5-6 97 ,000 3,000 
6-7 94 ,000 5,700 
7-8 88 ,300 10 ,300 
8-9 78 ,000 14,100 
9-10 63 ,900 13 ,900 
10-11 50 ,000 13 ,800 
11-12 36 , 200 13 ,200 
12-13 23 ,000 10 ,400 
13-14 12 ,600 6 ,300 
14-15 6,300 3,700 
15-16 2 ,600 2,200 
16-17 400 400 


5. Application to Kurtz’s data. An extensive collection of numerical data 
(mortality curves) on renewal of industrial equipment has been published by 
EK. B. Kurtz (1930), (1931). By way of example the analysis developed above 
has been applied to the data “Group III,” as fitted by him with a Pearson 
Type I curve, namely” 


; t — 10\%%8 is 45 
(62) Bit) = 14,950 (1 + au) (1 — 4043 ‘ 


22 Data from E. B. Kurtz, Life Expectancy of Physical Property, 1930, Table 22, Cols. 5 
and 6, p. 86, and p. 104, Fig. 50. 

23 The numerical values of the constants in the formula as here given differ slightly from 
those given by Kurtz, perhaps owing to the retention by him of higher decimals in his 
computations. There is also an inconsistency between Kurtz’s use of 10 for the mean in 
his formula, whereas on his drawing the mean is placed at 100. 
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The aperiodic component is the number of units originally installed (arbi- 
trarily assumed as 100,000) divided by the mean of the frequency curve (equa- 
tion 62). Following Kurtz, this has also been arbitrarily made equal to 10, 
which simply implies a particular choice of time unit. The fundamental data 
and characteristics are set forth in Tables I and II. The first six oscillatory 
components, were computed retaining moments and seminvariants up to ps, 
with the results shown in Table III and in Figs. 2 (right hand panel), 3 and 4. 


TABLE II 
Moments and Seminvariants of Curve of Replacements in First Generation™ 

















J Moments® m; | Seminvariants y; 
0 | 100 ,000 | 

1 | 0 | 10 * 

2 671 ,924 | 6.7192 

3 | 130 ,070 —1.3007 

4 


12,323 ,200 — 12.1228 





TABLE III 
Constants of the Series Solution (19) of Integral Equation (7) for First Six 
Oscillatory Components Computed from First Four Moments and 
Seminvariants of an Industrial Replacement Curve? 


Order of 


) G H 

een u 1 G H @ + © @ + 
0 | 0 0 10 .0000 0 .10000 0 
1 | —.11009 .57767 11.1688 | 4.1458 | 07869 —_.02921 
2 | —.30144 .98920 14.3353 7.6696 05423 | .02902 
3 | —.46500 1.28383 18.4982 10.4425 | .04100 | .02314 
4 | —.59500 1.51475 23.1094 12.7773 = .03314_ | .01832 
5 | — 69800 1.70500 29.2088 14.8877 02718 01385 
6 1 


7 .78000 


86117 | 32.5165 16.7797 .02429 .01253 


In particular, Fig. 4 shows the curve obtained by the summation of the first 
six oscillatory components superposed over the aperiodic (constant) component. 
It also shows the distribution curves of the first five generations within the 
range of the time scale on the diagram. Summation of these reproduces, 


24 Data from E. B. Kurtz, Life Expectancy of Physical Property, 1930, Table 22, p. 86, and 
Fig. 50, p. 104. 

2 Moments taken about age 10. 

26 This value of yu; is taken with reference to the origin. 

27 Data from E. B. Kurtz, Life Expectancy of Physical Property, 1930, p. 104, fig. 50. 
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within the errors of drawing, the resultant curve of the oscillatory solution, 
except for the very early stages of the process, where the oscillatory solution is 
of no practical interest, because the first generation alone dominates the whole 
process, and this is given by the observational data direct or after fitting with 
the curve such as (62). 

It remains to consider briefly the relative advantages of the method of solu- 
tion by differentiation, as originally applied by Herbelot, Risser, and others, 
on the one hand, and the use of the Hertz-Herglotz expansion, as introduced 
for the treatment of this type of problem by Sharpe and Lotka. 

One obvious advantage of the method of differentiation when it is applicable, 
is that the result is obtained in the form of a closed, finite expression for 
each cycle. 

Against this is to be reckoned, first, that the range of application of the 
method is severely limited. Preinreich in a recent issue of Econometrica (1938) 
uses for an illustration of the method a Pearson Type I curve, but in the very 
special and trivial form that the exponents are integers, namely 1 and 2. In 
practice the exponents will always be fractional, and then successive differen- 
tiations do not terminate as obligingly as in Preinreich’s case. As already 
noted, Preinreich, though citing Kurtz’s observational data on industrial re- 
placement, discreetly abstains from using these for his numerical example. 

Secondly, the disadvantage of a solution in form of an infinite series is more 
apparent than real. In practice the first few terms of the series obtained by 
the Hertz-Herglotz method will usually give an adequate representation of the 
facts, except for a short period immediately following the first installation. It 
is true that here this method, unless carried to high order components, may 
give an imperfect representation of industrial replacements, and may, in fact, 
give impossible negative values in this region, as in the example exhibited in 
Fig. 4. But this is practically unimportant, because in practice there will 
actually be few, if any, such very early replacements in an installation of finite 
dimensions. In fact, second and higher order replacements immediately after 
first installation are obviously out of the question in practice. For example, it 
may well happen once in a while that a telegraph pole is demolished on the 
very first day of service by collision with a truck. It is even imaginable that 
its replacement, put up the same day, might again be immediately demolished. 
But even in a country-wide installation one would hardly expect a third, fourth 
or fifth replacement to be required on the day of installation. In other words, 
that part of the replacement curve which relates to the very early period after 
first installation, is composed practically of first replacements only. 

So for example in the diagram, Fig. 4, the curve of total replacements, up to 
about t = 8, is simply the curve of first replacements, which is given directly 
by the data of the problem. Within the range of errors of drawing the influence 
of higher components are quite unobservable in this region. 

The case is even more favorable in the application of the method to the 
problem of population growth, for here there is actually no reproduction what- 
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ever until age a (say about 15) is reached. The part of the curve defined by 
the series (19) carried only to a finite number of terms,” and applied to values 
of t < a, is therefore simply rejected.” It may save many words of explanation 
if the reader is simply referred to Fig. 4 on p. 897 of my previous publication 
“The Progeny of a Population Element,’ which illustrates the point, the 
minimum age of reproduction being just short of 15. 

A major disadvantage of the method by differentiation is that it demands 
that the frequency distribution function g(a) be given in the form of a suitable 
analytic expression, or if it is not so given, that a suitable function or curve be 
fitted to it. The Hertz-Herglotz method, on the contrary, is directly applicable 
to the raw data, regardless of their form. Incidentally, curve fitting as practiced 
by Kurtz may produce a singular result. Jn 6 out of 7 of his types, the fitted 
frequency curve extends into negative field, implying that there are some 
replacements even before the actual installation. This may not be a very 
serious defect if the area of the curve in the negative field is negligible, but it 
should not pass unnoticed. 

One of the principal merits of the Hertz-Herglotz expansion is that it renders 
the course of events over their whole extent, and, in particular, makes clear 
the mode of approach to the ultimate state represented by the aperiodic term. 
Because the method by differentiation requires a separate expression for each 
cycle, it is at best ill adapted to present to the eye or to the mind a compre- 
hensive view of the evolution of the aggregate as a whole. 

In the introductory paragraphs it was pointed out that the problems of popu- 
lation growth and those of industrial replacement were closely analogous, though 
there were certain points of difference. It is of interest here to give considera- 
tion to these differences. 

One of these has already been noted. Replacement of industrial equipment 
may begin from the very moment of first installation, since accident as well as 
wear and tear must be provided for. Organic reproduction, on the other hand, 
does not occur immediately after birth. One result of this is that for any finite 
value of ¢, the number of generations contributing to the total births is itself 
finite; on the contrary, in the case of industrial replacement, if we interpret the 
equation (7) literally, there are at any moment an infinite number of genera- 
tions contributing. In practice this, of course, does not occur, and the equation 





28 There are, of course, limitations to the application of the solution (19). No one with 
any experience in the treatment of practical problems by mathematical analysis would 
think of fitting, by means of a reasonably limited number of terms, the first phases of the 
processes here discussed, in the case of a rectangular distribution of the first generation, 
for example. But the distributions with which we are actually concerned in practice are 
far from rectangular. Such as they are, they are well adapted to the method, as is seen in 
the two examples illustrated. 

29 There is nothing unusual in this rejection of negative values of the frequency function 
where it falls outside the range of actual values. It is what we all do in using such a fre- 
quency curve as Pearson’s type I, defined by a function which becomes negative outside 
the range of actual interest. 
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does not truly represent the facts in that a continuous distribution is assumed 
throughout, whereas for the higher order replacements ultimately the early 
frequencies are so thinned out that the discreteness of the units can no longer 
be disregarded. 

Nevertheless, from the very start we must be prepared to consider several 
generations of replacement as contributing to the total; this lends a certain 
special interest, in dealing with the first cycle of replacements, to the method of 
solution by differentiation, as used by Herbelot, Risser, Zwinggi, Schulthess, 
and lately Preinreich. It is true that this interest is much diminished by the 
limitations in the applicability of the method. 

On the other hand, in the case of organic reproduction, for the early part 
of the first cycle, the progeny of a population element belongs exclusively to a 
single (‘‘first”?) generation. Between ¢ = 15 and ¢t = 30, in our example, only 
first generation births are taking place, and here the solution (19) is of more 
theoretical than practical interest, since the distribution of births is simply that 
of the first generation births. 

Another point of difference is that the curve of ¢ (a) in the case of industrial 
replacement, if we may judge by Kurtz’s data, is a comparatively well behaved 
Pearson type curve. On the contrary, the corresponding curve of organic re- 
production is a very inconvenient type to fit by any of the standard methods. 
In view of this it is all the more remarkable that the solution (19) gives as good 
a fit as it does with only four components, as will be seen on referring to my 
original publication, ““The Progeny of a Population Element,” p. 897, Fig. 4, 
already referred to. 

Lastly, while the analogy is exact so long as we are dealing with industrial or 
organic aggregates maintained at a constant level, an essential difference arises 
when the case of a growing aggregate is considered. Organic growth takes place 
by what might be called ‘‘multiple replacement,” that is, one individual in the 
course of life gives rise, on the average, to n individuals, where n may exceed 
unity. Analytically this finds expression in that 


[ p(a)m(a) da > 1 


and the fact is automatically taken care of in the solution (19) by the fact that 
in such a case the single real root r > 0. 

Growth of industrial equipment, on the other hand, takes place by new units 
being installed in addition to replacement of disused units. The fundamental 
equations must be altered accordingly to take care of this case. 

In conclusion I want to make a remark regarding the function of such analyses 
as the one here presented. In this connection I can do no better than to quote 
a sentence from Cournot:” ‘Those skilled in mathematical analysis know that 





30 A. Cournot, Researches into the Mathematical Principles of the Theory of Wealth, trans- 
lated by N. Bacon, Macmillan Co., 1897, p. 3. 
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its object is not simply to calculate numbers, but that it is also employed to find 
the relations between magnitudes... .” 

It is essentially in this sense that the analysis of a problem of industrial re- 
placement is here offered. If we are merely interested in numbers, the direct 
arithmetical approach as practiced by Kurtz may be as good as any. But if an 
insight into the anatomy of the processes involved, and into their evolution from 
an initial condition to a final state is desired, then the setting up of the funda- 
mental equations, and their solution in exponential series or in other suitable 
analytical form, and a concise expression of the relation between the distributions 
in time of successive generations, or orders of replacements, have greatly superior 
merit as compared with brute attacks by arithmetic without regard to mathe- 
matical form. Nor are the systematic relations (in terms of certain seminvari- 
ants) that have been shown to exist between the distribution of successive 
generations to be regarded merely as “short cuts” for their computation, though 
sometimes they may be found convenient in that way. Their real significance 
lies in that they serve to complete for us the analytical picture of the process of 
evolution of the system under consideration. 


METROPOLITAN LIFE INSURANCE COMPANY, 
NEw York, N. Y. 
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ON THE MATHEMATICS OF THE REPRESENTATIVE METHOD 
OF SAMPLING’ 


By ALLEN T. Craic 


1. Introduction. This paper is designed to present certain topics in mathe- 
matical statistics which find application in some of the problems that arise in 
what has been termed the representative method of sampling. 

For descriptive purposes, it seems convenient to consider two aspects of the 
representative method. The first of these may be called the method of pur- 
posive selection. This method can be roughly characterized by saying that it is 
the method employed when the samples are chosen in such a way that each 
sample will possess one or more characters, say certain averages, which are 
identical with the corresponding characters in the population from which the 
samples are drawn. The mathematical conditions which underlie this method 
are rather stringent, and both theoretical and practical investigations seem to 
have proved that in general no great amount of confidence can be placed in the 
results obtained. 

The second aspect of the representative method has been styled the method 
of random sampling. This method can take either of two forms which we may 
call the method of unrestricted random sampling and stratified random sampling. 
The first of these is the classical method of procedure. That is, a sample is 
drawn at random from a given population and on the basis of these data infer- 
ences are made concerning the nature of the population. On the other hand, 
when the method of stratified random sampling is used, the population is first 
separated into a large number of parts, called strata, and the sample consists 
of an equally large number of ‘‘partial samples,’ each partial sample being 
drawn from a different stratum. It appears, both from theoretical and prac- 
tical results, that this method of stratified random sampling enjoys many 
advantages not shared by the other methods. 

We now turn to the main purpose of this paper, namely that of enumerating 
some of the theorems and methods of mathematical statistics which serve useful 
purposes in this theory. Discussion of how these theorems find application in 
the method itself has been reserved for other participants on this program. 


2. Estimates. From our preliminary remarks, it is apparent that the repre- 
sentative method is much concerned with the problem of estimating certain 


1 Presented, at the invitation of the program committee, to a joint session of the Insti- 
tute of Mathematical Statistics and the American Statistical Association on December 
29, 1938. 
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unknown parameters of a statistical population. On this account, we first con- 
sider the problem of estimates. 


Consider a population with arithmetic mean m and standard deviation co. 


Let 21, %2,--+,2%n, be m independent items drawn from this population and 
let C1 , C2, -++ , Cn be any finite real constants, not all zero to avoid the trivial 
ease. Write y = ca, + Core + --- + cn2,. Then the expected or arithmetic 


mean value of y is 
gy = Ely) = ma + c2 + --- + en), 


and the variance of y is 
oy = Ely - Gg} =o +--- +6). 


Suppose we inquire into the probability that y will have a value which is within 
a preassigned ¢ of its expected value. To this end, let C be the numerical 
value of the numerically greatest of the set c1, --- ,¢n, so that 0, < no’C’. 


Then by Tchebycheff’s inequality p, the probability that | y — 9 | < «, where e 
is an arbitrarily small positive number, is such that 


or 

2772 
no C 
—. 


p2i- 
€ 
In general, this inequality will have little interest. But if C is of the form 
+ . 2442 
M/n 2 , M independent of n, 6 > 0, then p > 1 — ed and by increasing n 
the right member can be made as near to one as we please. This means then 
that if we have a population with a finite variance and if we construct a linear 
function of the observations with coefficients of the nature indicated, we can, 
by increasing the size of the sample, make the probability approach one that 
the linear function will have a value arbitrarily close to its expected value.’ 
Now suppose that instead of constructing an arbitrary linear function we 
attempt to construct a function which will be an estimate of some particular 
parameter of the population. If the estimate is to be most serviceable, we 
should like to be able, by governing the size of the sample, to be as certain as 
we like that the estimate will have a value arbitrarily near that of the parameter. 
The preceding discussion shows that we can best achieve this by requiring that 
the expected value of the estimate be equal to the parameter sought. An 
estimate such as that just described is frequently called an wnbiased estimate. 
The use of such estimates in statistical problems makes it possible to avoid 
systematic errors in estimating parameters. In general, unique unbiased esti- 
mates of a parameter do not exist. For example, the arithmetic mean m of 


2 Under these conditions, the function of the observations is said to converge stochasti- 
cally to its expected value. 
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the population can be estimated from the sample 7, --- , 22 by any one of a 
large number of unbiased estimates such as (a1 + re + --- + 2n)/n, (a1 + 2n)/2, 
x4, and so on without limit. Thus it becomes necessary to make a choice of 
the unbiased estimate to be used. An appropriate criterion is that the unbiased 
estimate whose distribution has the smallest variance is the best to use. The 
2 


° ee ° 0 
reason for this can be seen by examining the preceding formula p > 1 — 4 
€ 
For if y,; and y2 are two unbiased estimates of the same parameter and if 
2 2 2 


2 ° o Cy, Oy, - 
oF, < o,,, then inp: > 1 — “ and p, > 1 — > we see that 1 — —*' is more 
e& € 


nearly equal to one than is 1 — vz Because of this fact we prefer, at least 
€ 


> 
“ 


in most problems, to use y; rather than ye as an estimate of the unknown 
parameter. An unbiased estimate whose sampling variance is a minimum is 
sometimes called a best estimate.’ It should not be inferred that the word 
“best”? has any implications other than those stated explicitly in the definition. 

The question very naturally arises as to whether we can determine these 
best estimates in particular cases. In general we can not determine them, but 
under certain conditions we can find best estimates if we are dealing with linear 
functions of the observations. A method and the conditions are set forth in 
an important theorem due to Markoff. We now consider his method. 


3. Markoff’s Method. Let there be given n statistical populations with 
arithmetic means m,, m2, ---,M, and standard deviations o1, a2, ---,0n 
respectively. We assume that no correlation exists between any of the popula- 
tions. Furthermore, suppose that each of the n arithmetic means can be 
expressed linearly in terms of & unknown, but unique, parameters, say 
Z1,22,-°-:,@. Thus 


My, = AyzZ1 + AyozZo + +--+ + Ayes 
(1) Meo = 2121 + Agoze + --- + Moxey 
Mn = AniZ1 + Anoze + +++ + Ankzx, 


where the a’s are known constants. Likewise, let 7’ be a parameter which is 
expressible linearly in terms of the same k unknown parameters, say 7 = 
bizi + beze + --- + b,2, , where the b’s are given constants. We draw a sample 
of n independent items, x; , X2, --- , 2, , in which one item is drawn from each 





3 An estimate of a parameter which converges stochastically (cf. footnote (2)) to that 
parameter is called a consistent estimate of the parameter. If a consistent estimate has a 
distribution which is normal for large samples and if the variance of that distribution is 
smaller than the variance of any other consistent estimate which also has a normal dis- 
tribution for large samples, then the estimate is called efficient. It should be observed that 
our definition of best estimate requires an unbiased estimate, whereas consistent and 
efficient estimates may be biased. 
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of the n populations. On the basis of this sample we seek to determine a set 
of numbers A, Ae, --- , An Such that JT’ = Aya, + Aote + --- + AnFn is the 
best estimate of 7’. 

Before attempting to find the solution, if one exists, let us first examine the 
mathematical implications of the problem. In the first place, in order that 
parameters 2, --- , 2, may exist, it is necessary and sufficient that the ma- 
trices A and B, where 








Qi Aye +++ A | Qi +++ Qik mM 
\| 
Goi 22 +++ Aek | |Qe1 G22 +++ Azk Me) 
A=|; j and B= |. 
I | } | 
| | 
[|@n1 Gn --++ Ank || |Qn1 Gn2-++ Ank Mn 


have the same rank. Thus we require that A and B have the common rank R. 
This being satisfied, we note further that if k > n, there will be infinitely many 
values of the z’s which will satisfy the equations (1). Thus we require in addi- 
tion that kK < n. Finally, we note that if the common rank R is less than k, 
there will be infinitely many values of the z’s which will satisfy the system (1). 
Hence we must have R = k < n. 

We now turn to a consideration of the solution of the problem. Whatever 
the values of the \’s, we have for the mean value and the variance of 7” 


E(T’) = ym + --- + AnmMa 
= A204 52; + ces + AnZOnjZ; , 


and 


2 22 2 2 
= Alo} + ee + AnOn , 


respectively. Since H(7’) must equal 7 as a part of the condition for a best 
estimate, then 


A214 ;2; + see + An2ZOnj2; = bi21 + see + biz 


identically in the z’s. That is, the coefficients of 2, , --- , 2, in the left member 
must equal the corresponding coefficient in the right member. Accordingly, 


QyA1 + Aide + -+- + Anidn = Di 
(2) Ay + Ae2h2 + ee + An2Xn — be 


QyeAy + Gozdy + +--+ + AnkAn be. 


If these equations are to have solutions for \;, --- ,An, we must make the 
additional assumption that the matrix C, where 


(Qu Ag +++ On by 


GC = |u (ig +--+ Ane be | 
| : 


yk ek -++ Ank Ody 
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has the same rank as the matrix of the coefficients, namely R. If this condition 
is satisfied we can write equations (2) in the form 


Qyd1 + +++ + Garde = 01 — Aegiadrgr — +++ — Anidn 
(3) 

Gyr + +++ + Onedn = De — AcaieAnga — -+* — OnkAn 
and solve for Ai, --- , A, in terms of the a’s, the b’s, and Ax41,---,An- Here, 
without any essential loss of generality, we take the non-vanishing k-rowed 
determinant to be that of the coefficients of \; , --- , A, in equations (2). Thus 
for arbitrarily assigned values of Agi, --- , An, We can compute the values of 
1, --- , Ax and these n values of the ’s will give us a 7’ which is an unbiased 


estimate of 7. That there will be, in general, an unlimited number of sets of 
values of the )’s is in keeping with our previous observation that unique unbiased 
estimates usually do not exist. 

The next part of the problem will consist in determining which, if any, 
of the above sets of \’s will make oc} a minimum. We recall that o> = 
hee, +... + Mee. In oe Ot oe replace \,,---,Ax by their values (in 
terms of Agi, --- , An) Which we obtained by solving the system (3). Then 
o7 Will be expressed in terms of 01, --- , on, the a’s, the b’s, and Axy1, «++, An- 
We next take the partial derivative of o7, with respect to each of Anyi, «++, An- 
On equating these partial derivatives to zero we will have a system of n — k 
linear equations in the n — k unknowns dy41,---,An- If these equations 
yield unique values for Axi, ---,An, they will in turn determine unique 
values of \;,---,Az. This gives us a unique set of ’s such that at one and 
the same time 


* « ii 
E(T’) = T and o7 is a minimum. 
The procedure which we have just outlined is most tedious to carry out in a 


particular case. Because of the insight of Markoff, a much better scheme is 
available for finding the best estimate of 7. Consider the function of 





21 gp iane Ye Zk ; 
F(a, cee y Zk) = Z. (= *) 
oj 

oe > a ant —-199 == ne 

0; : 
oF aF —_ sai ca 
Evaluate any *** ue and equate these partial derivatives to zero. This yields 

1 k 


the following system of k linear equations in the k unknowns 21, --+ , z%. 


9 

a5 031 0; A; 2X; 
a> +--- +z > oie _ » a 
Oj; 0; 


i oj 


(4) 





2 
1 jx Qik AjK Xj 
Be ann ag i BO, 


a. 


0; 0; 0; 
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If the system (4) yields unique values for the z’s, these values, when substituted 
in 7, yield exactly the same estimate of 7 as was found by substituting for 
the ’s in 7”. 

Perhaps an illustration will make this clearer. Suppose we have n = 2 
populations and that the means m; and mz are expressible linearly in terms of 
k = 1 parameter z,. Our equations (1) become 


m = an%1 
(1’) 

Mo = A1%1. 
Similarly, we have 7 = b,z; and T’ = \yx; + Aere. We first determine the 
\’s such that T” is the best estimate of 7. In accordance with the preceding 
steps, equations (2) become 


(2’) a1 + Gare = bi, 
and the system (3) becomes 


(3’) dy oe ale nwt 
ay 


Then 
2¢ 2° 
or = dio1 + Ao G2 


2 
(= cu — sd oi + deo2, 
aun 
because of (3’). Thus 


dor = — 2de1(b1 “sd dai d2)o + Qo, 
Ore a1 
2 


os 2 ° Oor: 
and for a minimum o7- we write a = (0 so that 
2 


2 
dab101 
ae 
Q1102 + A104 
Since 
A = (b; = dz 2) /A11 , 
then 


2 
bi duce 


: 2. 2 * 
Qi102 + A210; 


A = 





Our best estimate of T is found from 7” and it is 


2 2 
a bi ay02 21 + bi 2101 X2 


é liad 


2 2 2 2 
Qj102 + Ao101 
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By Markoff’s method we would form the function 


2 2 2 
reagents (ees) +(e =), 
2 


The system (4) reduces to merely 


2 2 
(4’) _ 402% + 210122 
a ae x 
Qi102 + Ag101 
We substitute this value of z; in JT = 6,2; and obtain 
2 2 
T = bidyo2%1 + bi de101 2 
an ee. eos) 
Qi102 + A210} 


which is the estimated value T’ above. 


4. Neyman’s modification of Markoff’s Method. We are indebted to Ney- 
man for a modification and adaptation of the Markoff method so as to make 
the method applicable to some of the problems of stratified random sampling. 
One of his examples will best illustrate the method. 

Suppose that a given population is divided into n strata. Let the jth stratum 
contain M; items and let these items be uj, wy, --- ,Uju;- The mean and 
the variance of this stratum are then 


‘ 1 . 
uj = U. > ujk and a; = > (ujx = il;)”. 
ik 


1 
M; 


Let 7 be the parameter JT = My; + Mot, +--- + Maiti,, so that 
T 
Mi+---+M, 
of the means of the n strata. We draw at random a sample of N items, the 
sample consisting of n partial samples, one partial sample being drawn from 
each of the n strata. Suppose there are n; items in the partial sample from 
the first stratum, m2from the second, and soon. Thusn ++ ---+n,=WN 

and the entire sample consists of the n partial samples 


, the mean of the population, is expressed as a linear function 


Ti11) Ti2, °** 5 Lin, 
To1, G22, >** 5 Lan, 
Xni;y Tn2, == » Unn,- 


From these N data we propose constructing an estimate 
z = Antu + a" + Nin; Tin, + en + Ani Xni + < + Anna Tnny 
which will be the best estimate of JT. Now the expected value of T’ is 


E[T’] = Bld X hin ase | - X X jx E(ajx) 
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which, by hypothesis, must equal T. Thus 
X U; z Ay = x M;i%; 


identically in the @’s. Hence 2a;(M; — Aj) = O which requires that the 
coefficients of a , 2, --- , %, must be zero. That is 


» Au = M 


Tham Ma. 
1 


Of course there are infinitely many \’s which will satisfy these equations. But 
we can eliminate all but one set by imposing the condition that o7, shall be a 
minimum. The algebra of mathematical expectation can be used to show that 


Mem (2 ) _M;_ 0. ly,,) 
= Loi M; — 1 wy, ih + gi % ie — 5 DU Nie 


" 
‘ , 7 1 “ ; . 
which will be a minimum when >> (ov ——> ry¥e) =0,f=1,2,---,n. Since 

nN; 
this is a sum of real squares, each term in the sum must be zero. Thus, 


Ln = = >> Ax. Since >> A; must equal M; in order that E(7’) = T, then 
i 


M; _,. ; j ; 
hu = — which uniquely determines the \’s and hence our best estimate of 7”. 
i 


It is important to observe that Neyman’s adaptation does not assume that 
the various strata are uncorrelated nor that there are necessarily replacements 
after each drawing in taking the sample. 


5. Estimation of Ratios. In certain problems in representative sampling it 
may be necessary to estimate both the numerator and the denominator of a 
fraction, say T7/U. If T’ and U’ are linear estimates of T and U then for large 
samples both 7’ and U’ will be approximately normally distributed in most 
eases. Further, if JT’ and U’ are correlated, they will usually be approximately 
normally correlated. Geary has proved that if we write 

_b+7T7" 


v= a+U” 


where a and b are constants and U’ and T’ are measured from their expected 
values, then 


_ aV —b 
2 2 = — SSE a 
V/V op — 2rVorou + or 
is approximately normally distributed with mean zero and unit variance pro- 
vided a > 30y. Here r is the correlation coefficient between T’ and U’. For 
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large samples this provides a convenient method of testing the significance of 
the difference between an observed and a hypothetical ratio of two linear 
estimates. 


6. Fiducial Inference. After an estimate of a parameter has been made, it 
is usually desirable to make some inference about the true value of the pa- 
rameter. For many years the concept of probable error was used in this con- 
nection. But the use of the probable error involves the assumption that all 
values of the unknown parameter are equally likely. This assumption is 
questionable and efforts to avoid making the assumption have led to a theory 
called fiducial inference. This method of statistical inference has broad implica- 
tions but limitations on our time do not permit our discussing the topic. At 
the close of this paper, we give certain references to the subject, including some 
of an expository nature. 


7. Conclusion. As stated in the introduction, this paper purports to give 
an exposition of some of the topics in mathematical statistics which find applica- 
tion in the representative method of sampling. Necessarily considerable 
selection of material had to be made. We believe, however, that the problem 
of the best estimate and an appropriate method of obtaining such an estimate 
are fundamental, and we hope that our exposition has helped to make clear 
these concepts of mathematical statistics which have proved so useful in the 
representative method. 
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1. Introduction. There are a number of fields in which experimental data 
cannot be treated with any success by means of the usual “Student’s” test and— 
very probably—by means of the more general analysis of variance z-test of 
Fisher. It is known in fact [1] that the t-test, as applied to two samples, is 
only valid when the populations from which the samples are drawn have equal 
variances. As the z-test is of a nature similar to the é-test, with the difference 
that it is applied to detect differentiation in means of more than two popula- 
tions, a similar conclusion seems very likely. Thus, whenever we have to 
compare means of populations with distinctly different variances, we have to 
look for some new tests. It may be useful to mention at once two instances 
in which the situation mentioned actually arises. 

As a first instance we may quote certain entomological experiments. Suppose 
it is desired to test the efficiency of several treatments intended to destroy 
certain larvae on a field. The experiments are arranged in the usual way. 
The treatments compared are applied to particular plots with several replica- 
tions and then the plots (or smaller parts of them) are inspected and all the 
surviving larvae are counted. Thus the observations represent the numbers 
of surviving larvae in several equal areas. It happens frequently that, while 
there is room for doubt as to whether there is any significant difference between 
the average number of survivors corresponding to particular treatments, there 
is no doubt whatever that the variability of the observations differs from treat- 
ment to treatment. 

We have another similar case in bacteriology. The experiments I have in 
mind consist in determining the bacterial density by the so called “plating 
method.” This consists in taking a number of samples of the analyzed liquid 
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and in spreading them separately on Petri plates. After a suitable period of 
time a number of colonies appear on the plates and their numbers represent the 
observational figures. I am informed that the variability of such observations 
does not depend very much on the technique of mixing the liquid and of taking 
the samples—when this technique is on a proper level—but does considerably 
depend on the kind and on the number of bacteria present in the liquid. 

The above examples justify an effort to find some new and more appropriate 
test. The first step in this direction must consist in an analysis of the ma- 
chinery behind the observable distributions and in deducing their analytical 
form. Once this problem is solved and repeated comparisons show a satis- 
factory agreement between the theory and the observation, we may proceed to 
the next step and deduce the appropriate tests. 

The purpose of the present paper consists in deducing a family of distribu- 
tions which provide a reasonably good fit in several cases in which they have 
been tested. It may be hoped that they will prove satisfactory also in many 
cases in the future. 


2. Distribution of larvae in experimental plots. When the problem of the 
distribution of larvae in experimental plots first arose, attempts were made to 
fit the Poisson Law of frequency. These attempts, however, failed almost in- 
variably with the characteristic feature that, as compared with the Poisson 
Law, there were too many empty plots and too few plots with only one larva. 
A similar circumstance is frequently, though not so regularly, observed in 
counts of microorganisms in single squares of a haemacytometer. These facts 
suggest that the distributions considered belong to a class which Pélya [3] 
proposed to call “‘contagious’’: the presence of one larva within an experimental 
plot increases the chance of there being some more larvae. And it is not diffi- 
cult to see the cause of this dependence. Larvae are hatched from eggs which 
are being laid in so-called “‘masses.”’ After being hatched they begin to travel 
in search of food. Their movements are slow and therefore, whenever in a 
given plot we find a larva, this means that the mass of eggs, from which it was 
hatched, must have been laid somewhere near, and this in turn means that we 
are likely to find in the same plot some more larvae from the same litter. Of 
course, there may be also others coming from other litters, too. 

A similar explanation may apply also to microorganisms counted in single 
squares of a haemacytometer or to colonies on parallel plates. However, here 
the situation does not seem as clear as in the case of larvae. As far as the 
haemacytometer counts are concerned, also another cause of contagiousness 
may be suggested. Witnessing once the process of preparation of the experi- 
ment, I noticed that, immediately after the drop of liquid was deposited into 
the chamber of the haemacytometer and for some time after, the positions of 
cells seen under the microscope were not fixed. Some of them seemed to lie 
on the bottom and the others were floating downwards in an irregular move- 
ment. Trying to follow the movements of particular cells I had the impression 
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that they were slightly attracted by the cells already stationary or semi-sta- 
tionary on the bottom of the chamber. If this impression of mine is justified, 
then the attraction of the floating cells by those already on the bottom could 
explain the contagiousness of the resulting distribution. It is known, how- 
ever, that this contagiousness is always rather small and that frequently the 
distribution of cells in the squares of the haemacytometer does follow the 
Poisson Law very closely. 

Owing to the fact that the cause of the contagiousness of the distribution 
of larvae in experimental plots is clear, we shall deal primarily with the distri- 
bution of larvae. Consequently, if the theoretical distributions that we shall 
deduce fit the empirical ones, we shall be more or less justified in assuming that 
we guessed the essential features of the actual machinery of movements of the 
larvae. On the other hand, if the same theoretical distributions appear also 
to fit satisfactorily the empirical counts of bacteria then in respect of these 
applications it will be safer to consider that we were lucky enough to finda 
sufficiently flexible interpolation formula. 

After these preliminaries we may proceed to a more accurate specification 
of the conditions of the problem considered. The experimental plot in which 
the larvae are counted will be denoted by P. We shall make no restriction 
as to the shape of this plot, but we shall assume that its area, which we shall 
take as unity, is small compared with that of the experimental field, F. The 
latter will be assumed to possess M units of area. We shall further assume 
that the moths laying eggs on the field F select spots for this purpose in a purely 
random manner. This presupposes that the experimental field is uniform in 
many relevant respects, e.g. is sown in all its parts by the same kind of plant, ete. 
Denoting by £ and 7 the codrdinates of the mass of eggs laid by some particular 
moth on the field F, we shall treat them as random variables with the elementary 
probability law 


(1) ple, n) = 2 


everywhere within F and zero elsewhere. After the larvae are hatched from the 
eggs there will be some mortality among them. Let us denote by n the number 
of larvae hatched from the same mass of eggs, surviving at the moment when 
the counts are made. We shall treat » as a random variable and denote by 
p(n) its probability law. At the present moment the writer has no information 
as to what may be the nature of the function p(n). Consequently it will remain 
in our calculations in its general form and, wishing to obtain some formulae 
for immediate calculations, we shall have to substitute for p(n) hypothetical 
formulae which, on intuitive grounds, may seem plausible. If the larvae counted 
are all more or less of the same age, there is a possibility that p(n) does not differ 
very much from the Poisson Law, but this point might be verified experimentally 
and we shall not insist on its being necessarily true. 

Consider now a single larva, survivor at the moment of observation, which 
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was hatched out at a point with coérdinates — and 7. Denote by zx and y the co- 
ordinates of this larva at the moment of counts. We shall consider xz and y as 
random variables. It is obvious that the probability law of x and y must 
depend on the values of ~ and 7. We shall assume that the dependence is 
of a particular character; namely, that the probability law of x and y given 
£ and 7 is a function of the differences x — £ and y — n. We shall denote it by 
f(@ ~ig- =. 

There is very little that we may consider as known about the function f(x — &, 
y — n). It may be treated as describing the habits of travelling of the larvae. 
There are some indications that there are certain directions in which the larvae 
tend to travel rather than in others, but they are too vague to be taken into 
consideration. Only one thing is certain: during the period of time between 
the birth of the larvae and the moment that the counts are made the larvae are 
able to travel only at some limited distance. Consequently we shall assume 
that for sufficiently large values of |x — &| and | y — | the function f(x — &, 
y — n) isidentically zero. Otherwise we shall not make any further assumption 
concerning f(z — &, y — 7), and it will remain arbitrary in our calculations until 
we reach the final general formula. 

While abstaining from making arbitrary assumptions concerning the habits 
of single larvae, we shall make one concerning the habits of several of them. 
This assumption, however, seems to be very plausible. We shall assume that 
the larvae have no social instincts, so that the random variables x and y cor- 
responding to one larva are independent from those corresponaing to any other 
—that is to say, apart from the possible dependence on the same pair of ~ and 7. 

Denote by N the total number of masses of eggs laid on the field F and let 
k; be the number of larvae hatched from the 7-th mass of eggs, surviving at the 
moment of observation and present within some particular experimental plot P. 
Finally let 


(2) X= Dk 


be the total number of larvae to be found within this plot. Our purpose will be 
to use the above hypotheses in order to determine the probability law of X. 
In doing so we shall first find that of any of the k,’s. Obviously, when con- 
sidering just one variable k;, it would be useless to retain the subscript 7, so 
that below we shall write simply-k to denote the number of living larvae, to be 
found within P, all of which were hatched from the same mass of eggs, situated 
at some point (&, 7). 

Let us first write the expression for the probability that one particular larva 
of that group will be found within P. This probability will be a function of 
£ and 7 only, say 


(3) P(é, n) = | [10 — §,y — n) dxdy. 


\w 
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Given that the number of survivors of the mass of eggs of the point (£, 7) 
is n, the probability that exactly k of them will be found within P will be repre- 
sented by the binomial formula, say 


(4) Ptk|n8,9} = pa mp PE MU — PG)". 


It will be noticed that in writing this formula we use the hypothesis that the 
larvae have no social instincts. 

Multiplying (4) by the probability law of and 7, and integrating with respect 
to those variables over the whole field F’, we shall obtain the probability, P{k | n} 
that out of the m survivors of a mass of eggs, laid anywhere within F, exactly k 
larvae will be found within P: 


(5) P{k|n} = aoe | ac n)(1 = P&E, n))” * dé dn. | 


Multiplying this result by p(n) and summing for all values of n, we shall 
obtain the absolute probability of k having any specified value. 

However, before doing so, we must use the hypothesis about the function 
f(x — &, y — nm) to deduce certain consequences concerning the integral in (5). 

Originally we did not make any assumption as to the origin of coérdinates 
on the field F. It will be now convenient to assume that it is located somewhere 
within the experimental plot P, for example in its center or in any other easily 
specified point. Owing to the particular property of the function f(z — §, y — 7) 
it will now follow that, for sufficiently large values of — and n, the probability 
P(é, n) will be equal to zero. Let us denote by A the part of the experimental 
field where P(é, 7) > 0. Obviously A denotes the set of points, a, in F such that, 
if a mass of eggs is laid in one of them, the distance of a from the plot’ P is not 
too large for the larvae hatched in a to reach the plot P before the moment of 
observation. Obviously also the plot P is included in A. Consequently the 
area of A, to be denoted by the same letter A, must be greater than unity. 
Owing to the lack of any precise knowledge of the nature of the function f(a — &, 
y — n) it is impossible to say anything about the shape of A. 

Let us now turn to the integral in (5). The function under this integral 
changes its form according to whether the point (, 7) is within or without A. 
If k = 0, then the integral in (5) reduces to 


[ [o-Pemrda=u—at f [a Pe@ mrad, 
If however k > 0, then 


(7) | i PYG 9) (1 — Pl, »))"*dedy = f f iii. oan 
Now we can write 


(8) P{k} = ZX p(n)P{k | n}, 
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which gives in particular 


@ Pe=o} 1-445 / [ Da- Penman 


and fork > 0 


(10) P{k} a 7 / [ Daw P*(E,)(1 = P(é,n))" “ p(n) dé dy. 


This is the general form of the probability law of k, which involves two un- 
specified functions p(n) and P(é, 7). We shall not analyze it but proceed to the 
calculation of the characteristic function ¢;(t) of k, which will then be used to 
calculate that of X. We have 


(11) ox(t) = » e'™ P{k} 


or, using (9) and (10), and after easy transformations 


Os «<1. 


H(i-4 ff X vewe, net +1 - PG w)" aed). 
4 A n>0 

Owing to the assumption that the larvae have no social instincts all the 
variables k,, ke, --- kw in (2) must be considered as mutually independent. 
As the characteristic function of any of them has the same form (12), the char- 
acteristic function, ¢x(t), of their sum, X, will be represented by the Nth power 
of the expression (12). Denoting by m the average number of masses of eggs 
per unit of area of the field F, so that N = Mm, we shall have 


ox(t) = g(t) 


(13) ; Mm 
- i! - es (1 * ; | [ X pln)(PE, ne + 1 — PCE, 1)" a in)} 


This will be the characteristic function of X for any value of M. If it is desired 
to put into effect the assumption that “‘M is large’, we shall have to consider the 
limit of (13) for M — «. This will be denoted by ¢(¢) and we shall have 


(14) ¢(é) = exp} — Am(1 — rf LX v(n)(PCE, ne" +1 — Pé, n))" dé in)}. 


In order to obtain the numerical value of the probability of X having any 
specified value X’, it remains only to specify the functions p(n) and P(é, 7) and 
to use the familiar formula 


2Qr —T 


+r P 
(15) P{X = X"} = -| o(t)e *'* dt. 


3. Particular classes of the limiting distribution of X. Until we have some 
experimental evidence as to what might be the nature of the two functions 
p(n) and f(x — &, y — n) or P(é, n), we may try a few guesses. If the results 
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obtained in this way agree with empirical distributions, we shal! have some 
reason to think that the guesses are not altogether wrong. 

In certain cases all the larvae considered are at the moment of observation 
approximately of the same age. Alternatively, we may count only larvae 
which are at the same stage of development. With such counts it is not un- 
reasonable to try for p(n) either the binomial or the Poisson formula. Either 
of them will lead to easy calculations of (14). Writing 


n 


(16) p(n) = e* x 


with \ representing the average number of survivors at the moment of observa- 
tion per unit mass of eggs, we shall get for ¢(¢) the following expression 


(17) o(t) = exp - Am (1 = + | [ fie" in)}. 
4 A 


Substituting here for P(é, 7) any suitable function we shall obtain a cor- 
responding particular form of the characteristic function ¢(¢), so that (17) 
determines a whole family of distributions. Substituting in (14) instead of (16), 
say the binomial formula, we shall obtain another family of contagious distri- 
butions. 

Strictly speaking, in order to obtain some particular distribution from the 
formula (17), we have to specify the function f(z — & y — 7), then to calculate 
P(é, ») and substitute it in (17). Since however we have no knowledge of the 
properties of f(z — & y — 7) and have to select it only on intuitive grounds, 
we may as well select the function P(é, 7). It may be selected either by itself 
directly, in which case there will be no difficulty in substituting it in (17), or by 
some indirect method. In the other case we may find it more convenient to use 
another form of (17) which is obtained by expanding the exponential under the 
sign of the integral in (17) and by integrating term by term, which is obviously 
permissible. In this way we get 

C) nj _ it n 
(18) log ¢(t) = Am » re =e 1)’ | 


n=1 n! 





Where P,, stands for the expression 


(19) P, = = II P"(E, n) dé dy 


and has the form of a moment of nth order of a certain probability law which 
it is easy to determine. 

We may consider for a moment the value of P(é, n) as a random variable Z. 
Its values cannot exceed the limits, zero and unity. Let z be any number 
between zero and unity and denote by AF(z) the measure of the set of points 
belonging to A where P(é, 7) < z. Then the function F(z) will possess all the 
properties of the integral probability law of a variable Z which we may identify 
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with P(é, 7) and the integrals P, will be simply the moments of Z namely, 
1 

P, = [ z"dF, where, of course, the integral would be considered in the sense 
0 


of Stieltjes. It is interesting to notice that P, is always equal to A~*. To see 
this consider the integral 


(20) AP, = If, P(é, n) dé dn 


and substitute in it the expression of P(é, 7) in terms of the function f(x — &, 


y — n). We get 
| [ae | [1 — t,y — n)dxdy 
If S@ — §,y — n) drdy dé dy. 


Where the four-dimensional region of integration W is defined as follows. (i) 
The variables x and y vary so that the point having them for its coérdinates 
may have any position within, but cannot be outside, of the experimental plot P. 
(ii) When z and y are fixed in the above way, say x = x’ andy = y’, then ¢ and 7 
may assume all those values for which the function f(x’ — & y’ — 7) is positive. 
Let us denote this system of values of — and 7 by B(z’, y’). Then we can calcu- 
late AP, as follows 


(23) AP, = ; [acay| f(x — &, y — ») dé dn. 
sr B(z,y) 


Now it is easy to see that the second integral in (23) is always equal to unity, 
whatever be x and y satisfying (i). To see this we have to recall the funda- 
mental property of the function f(x — £ y — 7»), due to the fact that it is the 
elementary probability law of x and y, namely that if ~ and 7 are fixed in one 
way or another, and it is integrated with respect to the other pair of variables, 
over all their values for which it is positive, the result will be equal to unity. 
In particular we shall have 


(21) AP, 


(22) 


(24) | | f(u, v) dudv = 1. 

f>0 
Consider now the second integral in (23) and make the substitution 
(25) ;*2-s% 7-8 


so that, instead of — and 7 we shall now integrate for u and v. It will be seen 
that the result of this substitution is exactly the integral (24), equal to unity. 
Since it was assumed that the area of P is equal to unity, it follows that AP; = 1 
This equality is thus the necessary condition that the function P(é, ») must 
satisfy. Besides, being a probability, it cannot be negative and cannot exceed 
unity. Whether any function having these properties may play the role of 
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P(é, ») must be left for further inquiry. Assuming temporarily that this is so 
we can tentatively specify the probability laws belonging to the class determined 
by (18) by substituting in (18) instead of the P,’s the corresponding moments 
M, of any distribution function F(z) with its range between zero and unity, 
remembering only the interpretation of its first moment that we have found 
above, namely M, = P, = A”. 


4. Certain general properties of the distributions deduced. Using the above 
result, we may substitute it in the formula (18) and get 
. 0 ne st = n 
(26) log $(d) = mn(e* — 1) + Am Y=)" p 
n=2 : 
Owing to the fact that the first term in the right hand side, md(e“‘ — 1), repre- 
sents the logarithm of the characteristic function of the Poisson Law, 





(27) plz) = em mr) 

x! 
for x = 0, 1, 2, --- the formula (26) is especially interesting. Comparing the 
formulae 


1 
P, = I zdF = A” 
0 


1 
P.= [ z dF 
( 0 


we see that 0 < P, < A so that AP, < 1. This circumstance assures the 
absolute and uniform convergence of (26). Frequently the higher moments 
P,, will be much smaller than the first, Pi, and if this tends to zero, all the 
products AP, for n > 2 will do so too. In those cases log ¢(¢) will tend to 
md(e"' — 1) uniformly for all values of ¢t. To see this take an arbitrary e > 0 
and select N so large that 


(28) 


C) (2n)" 
(29) nane1 n! * 2: 
Next let Ao be large enough for 
(30) AP. < <<" 
2m 


for all n = 2, 3,--- N and for any A > Ao. For such values of A we shall 
have 


ta XS 9p, 


n=2 n ! 


20 s at 
< Am (| je’ at P+ F d ie et a ce 


n! n=N+1 


(31) 


independently of what is the value of ¢. This result may be formulated as 
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Proposition I. If the parameters m and \ remain constant but the probability 
law F(z) is changed so that all the products AP, tend to zero for n = 2, 3,---, 
then o(t) tends to md(e* — 1) uniformly for all values of t and, consequently, the 
corresponding probability law of X tends to that of Poisson, given by (27). 

The above proposition may be considered as an explanation of the circum- 
stance that occasionally the distribution of larvae may be very close to that of 
Poisson. This may happen for instance when the larvae that we count are 
sufficiently old and have had a sufficient time to travel very far from the spot 
where they were hatched. In such cases A will be large and, if the function 
f(x — & y — n) has some appropriate properties, all the products AP, may be 
very small. But it is interesting to notice that there is a possibility of A in- 
creasing without the products AP, tending to zero. Such will be for instance 
the case if P(é, n) could have within A only two values B,(A) and B,(A) changing 
with A, one close to unity and the other close to zero. If Ap and Aq are the 
areas of the parts of A where P(é, 7) has those two different values, then we 
shall have 


(32) if = pB,(A) + qBA) = A” 


P,, = pBi(A) + qBz(A) 
and 
By(A) + qBz(A) 
33 AP. a & we 
- pB\(A) + qBx(A) 


may tend to unity as A is increased. In such cases the probability law of X 
will not tend to (27). While calling attention to this possibility, it should be 
emphasized that it is not likely to occur in practice. In the cases of discon- 
tinuous F(z) considered below P{X} does tend to (17). The same is true also 
in such cases where it is assumed that 

dF 
dz 


(34) a+bz>0 for 0<2<c<l 


0 elsewhere 
etc. 

Before proceeding to specialize the expression (26) of the logarithm of the 
characteristic function, we shall show the connection existing between the P,’s 
and the semi-invariants of X. To calculate the latter it is sufficient to differ- 
entiate (26) with respect to t, to put ¢ = 0, and to divide the result by the 
appropriate power of 7. Denotmg by y: the kth semi-invariant, by u; the first 
moment about zero, and by yu; the kth central moment of X we easily get 


= = mr 
= = md(1 + AAP») 
= md(1 + 3AAP2 + AX’P3) 
= ¥, = md(1 + 7ANP2 + 6AX’P3 + AA*P,) 
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It will be seen that, in general, the kth semi-invariant depends on P:2, P3, 
..- P, only. Another property of the new distributions that we shall mention 
is that they are “‘stable’’. 

ProposiTIon I]. Jf X1, X2,--- Xs are s independent random variables all 
following the same distribution with the logarithm of the characteristic function 


given by (26), then the sum Y = 7 X; will follow the same probability law with 


i=1 
the exception that instead of the parameter m it will depend on the product sm. 

In order to establish this proposition it is sufficient to notice that the logarithm 
of the characteristic function of the variable Y is equal to the expression (26) 
multiplied by s. 

Lastly, it may be noticed that the family of distributions determined by (26) 
is different from the comparable distributions deduced by Pélya ([3], p. 153, 
formulae (40) and (41)). In fact the logarithms of the characteristic functions 
of the latter could be written as follows: 


(36) —a log (1 — b(e"* — 1)) = able“ — 1) +a 


and 


en te =D _elet— 

1 — de* 

respectively and, even if the formal expansions in powers of (e"’ — 1) converge, 
the identification of those expansions with (26) would require that P, possess 
values exceeding unity, which is inconsistent with their essential property of 
being successive moments of a positive variable 0 < Z < 1. Of course, the 
convergence of (36) and (37) would impose special restrictions on the constants 
that those formulae involve. 


5. Contagious distribution of type A depending on two parameters. The 
simplest assumption that we can make concerning the function P(é, 7) is that it 
possesses some constant positive value within A and is zero elsewhere. Owing 
to (20) this constant value must be equal to A™’. Substituting this in (17) we 
immediately obtain, say 


(38) gi(t) = exp {—Am| 1 — exp (% iG .. ») }. 


We could use the above formula directly to obtain the corresponding probability 
law. But before doing so, it may be useful to illustrate the machinery of the 
alternative method of obtaining the characteristic function of X and to calculate 
the same formula using (26). 

If P(é, n) is equal to A~* everywhere in A, this means that the function F(z) 
is a step function, which is equal to zero for any z < A™ and is equal to unity 
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elsewhere. Accordingly we shall have M, = A ”. Substituting this into (26) 
instead of P, we easily get 


(39) log ¢:(t) = in ) 1) 


which is equivalent with (88). 

We shall now proceed to the calculation of the probabilities P{x = k} as 
determined by either (38) or (39). For this purpose it will be useful to notice 
that the characteristic function (38) depends really on two parameters only, 
which we shall denote by m and m2, 


(40) m, = Am, me = r/A 
In order to simplify the printing we shall further denote 
(41) z=me 


Expanding the two first exponentials of the three involved in (38), we may write 


ne) 


(42) oi) =e" . mt eal 
k= = n! 

This is the form of the characteristic function which is the most convenient 
when we have in mind applying the formula (15). In fact, it will be seen that 
we may multiply (42) by e ‘*' and then integrate the series term by term. 
Further, it will be noticed that, on integrating between the limits —7 and +7, 
all the terms of the product will vanish except for the one which is independent 
of t. Consequently, the result of substituting (42) in the right hand side of (15) 
will be the coefficient of e’*’' in the expansion (42), so that 


k @ t 
XY — ki = o™ me z Xk 
(43) PIX=h}=e™ De. 


As it is easy to verify, we have 


an 


(44) Pix =0} =e™o" ® 
and, fork > 1 


7 k ak 
Pepe rm mt E me™ 
(45) Pa = 8} = kl dut ‘ eal 


This formula gives an easy check of the identity 7 P{x = n} = 1. In fact, 
n=0 


the left hand side can be looked upon as a product of e ”' by the Taylor’s expan- 
sion of the function differentiated in (45) taken at the point u = m2, which 
gives identically unity. 


LOPE FO ET TTT TT Le TIS A 
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Successive differentiations give in turn 


(46) P{X = 1} = eomi-e™2) ame 
— 
(47) P{X =2} =e™0" | (mie + me ™*) 


etc. Comparing the formulae (44), (46) and (47), the effect of the “contagious- 
ness” of the distribution is easily seen. P{x = 2} differs from what it would 
have been, if the distribution was that of Poisson, by the additional term me ™ 
within the brackets. 

Formulae (44), (46) and (47), and others which could be obtained by differ- 
entiating as indicated in (45), could be used for numerical calculations. How- 
ever, these are greatly simplified by the use of the following elegant formula, 
deduced by Dr. Geoffrey Beall of the Dominion Entomological Experimental 
Station, Chatham, Ontario. 


n 


: —m2 t 
(48) { n+1} rt 2d 3 P{iX =n-t} 





The correctness of this formula may be easily checked by calculating P{X = 

n — t} from (48) and by substituting it in (48). Simple rearrangements will 

then give what could be obtained from (43) by putting k = n + 1. 
Substituting P, = A” in formulae (35) and taking account of (40), we get 


(49) Mi = AM = mM, 


(50) be 


I 


Am (1 +) = mme(1 + me). 


Solving these equations for m; and m2 we obtain the formulae 
/ , / 
(51) me = (u2 — w1)/H1, m, = p/m 


If the moments yu; and ye are determined for an empirical distribution, these 
formulae may be used for estimating m, and m2. In cases which were tried, 
this process did give frequently a satisfactory fit. Sometimes, however, when 
the tail of the original empirical distribution was very irregular, this distribution 
was better approximated by calculating the moments yu; and ye not from itself 
but after a certain amount of smoothing of the tail. It follows that the method 
of fitting the new distribution to the empirical data requires some further study. 
At present it will suffice to mention that, whenever this distribution was tried 
on distributions of larvae which at the moment of counts were approximately 
of the same stage of development, the fit obtained was very satisfactory. It is 
hoped that a number of actual distributions fitted, together with the description 
of the method of counting, etc., will be soon published by Dr. Beall. As a matter 
of illustration one of his distributions is reproduced at the end of the present 
paper. 
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As for the distribution considered we have 


(52) lm AP, = limA”™'=0, n=2,3,--- 

Ao 
It follows from the above theory that, as A — , the probability law (48) tends 
to that of Poisson, namely 
(53) lim P{X = nj = comms ate) 

A—eo n! 

For this reason the distribution (48) could be perhaps called the generalized 
probability law of Poisson, but it seems that the term “contagious distribution 
of type A with two parameters” will be more descriptive. Further on we shall 
see what is the justification of the description “‘of type A’”’. 

It was stated at the outset of the present paper that, when comparing the 
distributions of larvae in two series of plots subjected to two different treat- 
ments, there is sometimes doubt whether the means of those distributions are 
equal or not, while the difference in variability is more or less obvious. The 
formulae (49) and (50) give us the explanation of these facts. It is seen from 
the formula (49) that the mean of the distribution is equal to the product of the 
mean number of masses of eggs per unit of area and of the mean number of 
larvae per mass of eggs surviving at the moment of counts. If the two treat- 
ments compared are of about the same efficiency of killing the larvae, then the 
values of \ for each of them will be approximately equal and, consequently, 
we shall obtain about the same values for the two means. But while being of an 
equal efficiency as far as the killing is concerned, the two treatments may annoy 
the larvae in an unequal way. For example if the first treatment is dummy 
(no treatment) and the other is in general ineffective, it may still spoil the taste 
of the leaves that the larvae feed on. In such a case they may be compelled to 
travel a little more than they would otherwise, which will lead to an increase in A. 
Looking at the formula (50), it is easy to see that this would lead to a decrease 
in the value of we. Alternatively the treatment may produce a temporary 
paralysis of the larvae which may reduce A and bring an increase of pe . 

These remarks were applied to moments (49) and (50) of the particular dis- 
tribution (45), but looking at the formulae (35), it is easily seen that they are 
true in the general case also. 


6. Contagious distributicns of type A depending on three parameters. As 
mentioned before, in order to determine some particular contagious distribution 
contained in the class depending on equation (18) it is sufficient to substitute 
in it instead of the P, the moments of any distribution with its range confined 
to the interval from zero to unity, with the only restriction that the reciprocal 
of the first moment should be equal to A. Obviously this could be done in an 
infinity of ways, all of which will give more or less different results. We shall 
select the following one, representing a natural generalization of the procedure 
adopted above and leading to very simple formulae. 
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Formerly we have assumed that P(£, 7) possesses a constant value A~ within 
the whole area A. At present we may assume that within this area it may 
possess one of two (three, four, etc.) values, say B; and B.. Considering again 
P(é, n) as a random variable Z, this will be equivalent to an assumption that Z 
may possess only one of the values B; and B, both positive and not exceeding 
unity. Again the probabilities of Z = B; are at our disposal. We shall take 
that these probabilities are equal, i.e. equal to 3. 

Comparing these assumptions with what may be the actual situation, one 
may be led to think that they are rather artificial. This however is not so. 
There is no doubt that the value of P(é, 7) does change within A, and it is also 
probable that the change is smooth. As we have no knowledge of the character 
of this function we first take its mean value within the area A and treat it as its 
first approximation. Next we divide the area A into two equal parts, say A; 
and A, and so that the greatest value of P(é, n) in A; does not exceed any of the 
values in Az. Then taking the average of P{£, 1} within A; and a similar aver- 
age within Az and denoting them by B, and Bz respectively, we do obtain a 
better approximation to the actual values of P(é, 7) assuming that it is equal 
to B; everywhere in A;. That is, in fact, the real meaning of the hypothesis 
formulated above and that we are going to accept in the following. 

Denoting again by M, the moments of Z we shall have 


(54) M, = 4(B, ot B,) = a” 

and generally 

(55) M, = 3(Bi + Bz). 
Substituting (55) in (26) we get, say 

(56) 2(t) ie = pan i ghBalet*—1) acs 2). 


We notice that this expression depends on three parameters, say 
(57) m, = Am, me = \B,, m3; = AB2. 


In order to get the formulae for the probabilities of X having any specified values 
we could again apply the method used above when treating the more simple case. 
It may be useful however to illustrate a shorter way which easily leads to a 
generalization of Dr. Beall’s recurrence formula. As we have noticed before, 
the probability P{X = k} is equal to the coefficient of e** in the expansion of the 
characteristic function in powers of e‘’. Substituting for simplicity z = e“, 
so that t = —7 log z, we may say that, if ¢(¢) is the characteristic function of a 
variable X, which is able to possess only integer values, then P{X = k} is equal 
to the coefficient of 2‘ in the expansion of say ¥(z) = ¢(—7 log z). Applying 
this rule to (56) we can write the following expression for the generating func- 
tion ¥(z), 


(58) y(z) a e771 pls (om 2 GD pema(e—D)} si > 2*P{X oa ky. 
k=0 
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In other words 


Mm) 
—m,, 9 {e m2+e—mMs3} 


(59) P{X = 0} = Vo =€é é 
1 d‘y | 
= K = | (C= e- 
(60) PIX =k} = a? k = 1,2, 
But 
dy _ m mo(z—1) m3 (z—1) 
(61) ale V(z) {mee + mse \ 


m 
= 7 v(z)x(z) (say) 
and it is easy to see that generally 


k 
dx k+1_mo(z—1) k+1_m3(z—)) 
(62) dz = Me " , + m3 . . 


As the kth derivative of ¥(z) in (60) may be calculated by applying the familiar 
formula for the (k — 1)st derivative of the product ¥(z)x(z) in (61), we obtain 


d™ty m,~< n! e a) 
(63) de |0 = 2 iw — Bi \ at a) ano 


Using the formulae (60) and (62) we immediately obtain 


64) PIX my, . me"? + me ™ PIX k 
X = 1 = —— —_—___—_ —_ = — Kt. 
( iX="+l = sqanes k! a 
As whenever B; = By, and consequently mz = m3, the distribution considered 
now becomes identical with that considered formerly, depending on two param- 
eters only, it is seen that the formula (64) represents a direct generalization 
of the formula (48). For purposes of successive calculation of the probabilities 
it will be probably more convenient to write (64) in the following form 


—m2 7 k 
Roa ws MiMee 7" Me Tr 
P{X =n+1} an EI) ki P{X =n — k} 
(65) 


m3 n 


~ k 
MyMz3e m3 , 

P{/xX =n—k}. 
Sin Fi) 2 El n — Kf 


This device of finding a recurrence formula for the probabilities will always 
succeed whenever there are no difficulties in finding the value of the nth deriva- 
tive of the function x. 

It may be easily shown that if m and \ remain fixed but A tends to infinity, 
then the distribution (60) tends to the Poisson Law of frequency. Owing to 
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the general result stated in Proposition I, in order to show this it is only sufficient 
to prove that for n > 2 

; . Brt+ Be 
66 lim AM, = lim — 
- on = ah 


As both B, and Bz must be included between zero and unity and their sum is 
equal to 2A™", it follows that 


= 0. 


(67) 0<B,<A™’ <B, < 2A”. 
Therefore 
(68) 0 < AM, < * Att 


and (66) becomes obvious. 


Substituting the values of M2 and Ms; instead of Pz and P; in the general 


expressions (35) of the moments, and taking into account the formulae (57), 
we obtain 


" = 3m,(m2 + ms) 
(69) us = 3mi(me + mz + m+ mi) 
us = 3mi(me + ms + 3(mz + m3) + m: + m3). 


If it is desired to fit the distribution to some empirical one using the method of 
moments, then these formulae could be solved with respect to m;, mz and m3. 
We may proceed as follows. Write 


(70) @=2i, b=2u2—ui), © = 2(us + 3u2 + Qui). 
Then 

(71) m (m2 + m3) = a 

(72) mi(ms + m3) = b 

(73) mi(m2 + m3) = ¢. 


Multiplying the first of these equations by m2 and subtracting the result from 


the second and repeating the same process with the second equation and the 
third, we get 


(74) mym;3(m3 — m2) = b — ame 


mym3(m3 — m2) = c — bme 
and it follows 


Mm, = —— 
. b — am 
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or 


(76) ; (me + m3) — moms = 


Again, dividing (73) by (71) we get 


c 
(me + ms)" — 3mym2 = - . 
a 


Multiplying (76) by 3 and subtracting from (77), we obtain 
(78) s’ — 3bs/a — 2c/a = 0, 


where s = m + m3. _ It follows that 


(79) a / (2) _ 2 
2a 2a a 


(80) Mom3 = p = bs = : 


(81) mes (s — ~/s? — 4p) 
(82) Mm, = (s + V/s? — 4p) 
(83) m, = a/s. 


Following these steps we finally arrive to the values of all three parameters, 
given by the last three formulae. 

If the values of the moments uj, we and ws were known without error, the 
above formulae would give accurate values of m:, mz and m3. If, however, 
the moments are estimated from a sample, then the reader must be prepared 
that, even if the observed variable follows exactly the law, occasionally the 
sampling errors in the moments will make it impossible to carry out all the 
calculations indicated. Especially this may easily happen when the true values 
of m2 and m; are equal or nearly equal, so that the empirical distribution is close 
to that given by the contagious distribution with only two parameters. As it is 
seen from (81) and (82), in such a case the true values of s and p must satisfy 
the relation 


(84) s — 4p = 0. 


However, the sampling errors in the moments will ascribe to the left hand side 
of (84) a value only approximately equal to zero, which may be either positive 
or negative. In the latter case we shall not be able to use (81) and (82) to 
estimate m.and m3. Asa matter of fact, the above circumstance actually arose 
in one case when it was tried to fit the three parameters distribution to a set 
of data which were excellently fitted by a simpler formula (45) involving only 
two parameters. As mentioned before, the problem of fitting the distributions 
which are deduced here requires further consideration. 
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Looking back on the method by which we have substituted a contagious 
distribution with three parameters m,, m2, m3 for the simpler one with only 
two parameters, it is easily seen that it can be carried further leading to distri- 
butions with four, five, etc. parameters. In each case we would mentally divide 
the area A in a number of parts of equal size so that the values of P(é, 7) in the 
first never exceed those in the second, etc. Denoting the average values of 
P(é, n) in those areas by B,, B., --- , B, , we shall obtain the moments 


(85) Min Se. 


substitute them in (26) and proceed more or less as we did above. All the 
distributions which may be obtained in this way possess certain common traits 
and I propose to call them “of type A’. If the number of parameters in such a 
distribution is sufficiently high, it seems practically certain that the function 
P(é, 7) will be well approximated and we may hope to get an excellent fit. 
However, if a good fit may be attained only by introducing a great number of 
parameters, it usually means that the method of introducing those parameters 
is not very successful, and therefore it does not seem worth while to discuss in 
greater detail the distributions of type A with the number of parameters exceed- 
ing three. Instead we shall briefly indicate another class of distributions, built 
on another principle, which may be called of type B or C. 


7. Contagious distributions of types B and C. As mentioned before, when- 
ever the distributions of type A were tried on data, the character of which did 
not obviously contradict the basic assumptions of the theory (approximate 
equality of age of the larvae), the results were always satisfactory. However, 
our present experience is rather limited and it is well to anticipate the failures. 
We may expect that these will be caused by the over-simplified assumptions 
concerning the function P(é, 7). 

In order to deal with such a case we may assume that for 0 < z <1 the 
derivative of F(z) exists and is either a linear function of z or is equal to zero. 
Writing p(z) = dF /dz we shall put 
- pi(z) =4A for0<z<2A", AD>2 
” = 0 __ elsewhere. 


Alternatively we may write, say 


2A? ae ~~ 
p2(z) = (3A" — 2) for0 <z<3A 


(87) 


0 elsewhere. 


In the first case we shall obtain, say 


’ 1 2\" 
(88) M, = Tat (3) ‘ 
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On the other hand, the moments of p2(z) will be given by 


! 2(3A7*)" 
89 eee 
- (n+ Dm +2) 
Substituting these expressions in (26) we shall easily obtain the two new 
forms of the characteristic function of X, say 


/ 


emalet’—1) po 
(90) log $3(t) =—m+m “me — 1) ‘ 
with 

(91) m = Am and me: = 2X/A. 


Accordingly, the generating function of the probabilities will be, say 


emo(z—-1)_]4 


0 
(92) Walz) =e ™e met = YY 2"P{X =n}. 
n=0 
The distribution determined by (92) may be called of type B. 
Using the moments (89) and substituting them in the usual way in (26), we 
obtain, say 
ue) a 1 _ ma(e" _ 1) 


(93) log o;(t) = —m + 2m mi(et — 1)” 


’ 
with 
(94) m = Am and mz = 3\/A. 


The probabilities of X having any specified value will be generated by the 
function, say 


(95) Wa(z) = so 


eme(z—-1)_1—mp (2z—1) °o 
a m5 (2-1)? : = 7 2" P{Xx a n}. 
n=0 
The probability law determined by (95) may be called of type C. The com- 
parative merits of all those distributions could be judged by comparing them 
with the results of observation. 


8. Illustrative Examples and Concluding Remarks. Any series of positive 
terms adding up to unity may be considered as determining a probability law 
of a discontinuous variable such as the X considered above. When trying to 
obtain probability laws fitting the empirical distributions of some particular 
origin, the distributions of the numbers of larvae in experimental plots, or the 
like, we could really start by considering series of some positive terms each 
depending on one or more parameters, say 


(96) Uo(m™m, , Me), Ui(™M1 , Me), Uo(M1, Me), --- , Un(MM1, Me), «++ 


and having the property that, whatever the values of those parameters, - 
n=0 


Un(m, M2) = 1. Studying a considerable number of empirical distributions, 


a ae ee. ee ae ee ee. ee 
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we could apply the ‘‘method”’ of trial and error to guess the form of dependence 
of the wn(m , m2) on the m’s so that for a broad class of empirical distributions 
there would be a system of values of the m’s, for which the series (96) would 
satisfactorily fit the data. If we succeed in this task we shall be entitled to a 
considerable satisfaction as the solution that we obtained would permit various 
further studies, e.g. the deduction of tests of significance applicable, or approx- 
imately applicable, in various cases, and so on. 

Looking back at the history of statistics we shall find that the systems of 
frequency curves of Pearson, of Bruns-Charlier and others belong to the class 
of results just discussed. They are very important—and this especially applies 
to the Pearson curves—because of the empirical fact, that it is but rarely 
that we find in practice an empirical distribution, which could not be satisfac- 
torily fitted by any of such curves. Consequently, wishing to deduce some test 
applicable in this or that case, we may usefully assume that the basic distribution 
is one of the Pearson system and, owing to the frequently continuous character 
of the connection between the conditions and the final results, our final formula 
will be approximately valid when applied to the data under consideration. 

This point of view is not unfamiliar in pure mathematics. For example, we 
know that a broad class of functions may be approximated with any prescribed 
accuracy by means of polynomials. Wishing to prove a theorem applicable 
to this class of functions, we sometimes start by proving it for polynomials and 
then conclude that it is also true for the whole class. Here the réle of poly- 
nomials is perfectly analogous to that of Pearson curves and could be described 
as that of good interpolation formulae. 

But the problem of deducing theoretical distributions could be also considered 
from a slightly different point of view. Here again we require that the theo- 
retical distribution fits satisfactorily the empirical data. But we may legit- 
imately require something else: an “explanation” of the machinery producing 
the empirical distributions of a given kind. I have enclosed the word ‘‘explana- 
tion” in quotation marks so as not to suggest that I am attaching to it too much 
importance. Mathematics is always dealing with the conceptual sphere which 
is quite distinct from the perceptual and, at most, admits the possibility of 
establishing some correspondence. Therefore, however hard we try, we can 
never produce anything like a real mathematical explanation of any phenomena 
but instead only some “interpolation formula’, some system of conceptions 
and hypotheses, the consequences of which are approximately similar to the 
observable facts. But this similarity may be differently placed. In the case of 
Pearson’s curves it applies to the shape of these curves and to the shape of the 
empirical histograms. Otherwise it may apply to certain real features of the 
phenomena studied and to some mathematically described model of the same 
phenomena. And if the theoretical distributions deduced from the mathe- 
matical model do agree with those that we observe, and if that agreement is 
more or less permanent, we say that the mathematical model has “explained” 
the origin of the distributions. 
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If the problem of deducing interpolation formulae, sufficiently flexible to 
represent adequately a class of distributions, is of considerable interest, then 
that of producing similar formulae but involving an “explanation” of the 
phenomena studied, seems to be still more interesting. Of course, for it to be 
considered as successfully solved, the theoretical distributions deduced must fit 
the empirical ones, of a clearly specified kind, “practically always’. At the 


TABLE I TABLE II 


Distribution of European corn borers in Distribution of yeast cells in 400 squares 
120 groups of 8 hills each, (data pro- of haemacytometer observed by “‘Stu- 
vided by Dr. Beall), fitted by Poisson dent” (1907), fitted by Poisson Law 


Law and by type A Law with two and by type A Law with two param- 
parameters eters 


Frequency Frequency 

—————— - Neo j|——= ~ — 

borers | | QOb- | Exp. cells 
served f 


Ob-  |Exp. T. A. 


Exp. P. L. served 


24 0 202 213 | 214. 
16 1 138 | 128 | 121. 
16 2 47 37. | 45. 
18 3 11 18 | 18. 
15 4 | 

9 5 
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present time we may quote a number of instances where it was possible to estab- 
lish a mathematical probabilistic model of some class of phenomena determining 
probability laws which fit the empirical distributions with a remarkable accu- 
racy. Perhaps the most important class of these phenomena is provided by the 
Mendelian theory; a number of other examples, although of a lesser importance 
but still interesting, have been mentioned elsewhere [2]. In-all of them success- 
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ful checks and rechecks increase our confidence that the conclusions based on the 
mathematical model determining the theoretical distributions will satisfactorily 
apply to observational data and also that our interpretations of various constants 
is more or less correct. 

Now, what is the situation with the contagious distributions deduced above? 
They do represent an attempt to give good interpolation formulae involving an 
“explanation” of the observable phenomena, and all the constants introduced 
have meanings which are easy to interpret. Owing to the fact that in the 
process of the larvae surviving and spreading over the field there are certain 
unknown features, the final general formula that we have deduced involves 
two arbitrary functions p(n) and P(é, n). By substituting for them any appro- 
priate functions that the intuition may suggest, we can obtain a number of 
distributions, each of which may or may not provide a satisfactory interpolation 
formula. Whether they do or not, must be empirically tested. 

Up to the present time the contagious distributions of type A were tried on 
12 distributions of larvae and on three distributions of yeast cells in squares 
of the haemacytometer, which did not quite agree with the Poisson Laws. 
The results of these trials were always the same: The type A distribution 
with two parameters provided an excellent fit, which was never worse than that 
of the more elaborate distribution with three parameters. This circumstance 
seems encouraging, but future experience may be less satisfactory and it would 
be very desirable to have some more empirical distributions and checks. 

The following table gives two empirical distributions fitted with Poisson Law 
and with its generalization, as provided by the type A distribution with two 
parameters. 
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ON CONFIDENCE LIMITS AND SUFFICIENCY, WITH PARTICULAR 
REFERENCE TO PARAMETERS OF LOCATION 


By B. L. WELcH 


1. Introduction. The solution of the problem of estimating an interval in 
which a population parameter should lie, by means of what is now often termed 
the fiducial type of argument, dates back to the early writers on the theory of 
errors. However, owing to their lack of “Student’s” z distribution, their state- 
ments were usually only of an approximate character, and, furthermore, the 
logical distinction between the fiducial method and the method of inverse proba- 
bility was never clearly drawn, before R. A. Fisher discussed the subject. It is 
of interest to note how far ‘‘Student”’ himself went in this matter. In describing 
the tables which he gave in his original paper he says:’ 

“The tables give the probability that the value of the mean, measured from 
the mean of the population, in terms of the standard deviation of the sample, 
will lie between — x and z. Thus, to take the tables for samples of six, the 
probability of the mean of the population lying between — ~ and once the standard 
deviation of the sample is 0.9622 or the odds are about 24 to 1 that the mean 
of the population lies between these limits. The probability is therefore 0.0378 
that it is greater than once the standard deviation, and 0.0756 that it lies outside 
+ 1.0 times the standard deviation.” 

It should be noted that “‘Student’s” z is ( — 6)/ s where @ is the true popula- 
tion mean. His tables tell us that for n = 6, P(z < 1)’ is equal to 0.9622. 
Owing to the symmetry of the z distribution this is equivalent to saying that 
P(z > —1) is 0.9622, i.e. 


pis? iw i} = 0.9622. 


This may be transposed to read 
(1) P{@<Z+s} = 0.9622 


which is the statement I have italicized in the above extract, it being there under- 
stood that the mean of the population is being measured from the mean of the 
sample. ‘Student’’ therefore makes here what is now called a fiducial state- 
ment. In the next sentence he, in effect, attaches a probability to an interval 
estimate for the population mean. In doing this ‘“‘Student’’ was not conscious 
of introducing any new principle, nor does he apply the method consistently 


1 “Student” (1908). ‘‘The Probable Error of a Mean.’’ Biometrika VI, p. 20. 
2 P is used to denote the probability of the truth of the relation in the bracket following. 
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to other problems of estimation. For instance, in discussing the estimation 
of the correlation coefficient p about the same time, he formulates the problem 
in terms of inverse probability, although he was fully aware of the difficulties 
involved in postulating an a priori distribution for p. 

In discussing the problem of interval estimation more generally, I shall adopt 
some of the terminology used by J. Neyman.*® The sample observations 
t1, %2, +++ Ln Will be noted collectively by E (standing for the ‘“‘event’’ point 
when the observations are represented as coérdinates in a space of n dimensions). 
Then if @ is an unknown parameter, a a fixed probability, and F(E, 6, a) a func- 
tion such that 


(2) P{F(E, 0,a) > 0} =a 


we may obtain an interval estimate for @ as follows. Let 6(Z, a) denote the 
set of values of @ such that for any @ in the set we have F(E, 6, a) > 0. Then 
if we use the notation {6(EZ, a) C 6} to indicate that the set 6(E, a) contains or 
“covers” the true parameter @ we shall be able to rewrite (2) 


(3) P{5(E, a) CO} =a. 


We can then adopt the following rule to obtain an interval estimate for 6: (a) cal- 
culate from the sample the set 6(Z, a), (b) make the statement that 6(E, a) 
covers @. In adopting this rule we shall be right in the proportion a of cases. 

There are, in general, an infinite number of ways in which we can start with 
a statement of the type (2) to reach the statement of type (3). Neyman has 
discussed methods of making the best choice between such statements. His 
approach to this problem may be illustrated by the following example. 

Suppose we have a random sample of n from a normal population with stand- 
ard deviation o and let 


and w = range = largest x — smallest x. 
Then we can find a constant b. such that 


(4) P{* > bab = a 


and, turning this round, we obtain 


(5) Po < ‘| = a. 


This means that, if we choose a = .99 (say), then we can say that ga is less than 
&/b.9 and in 99% of cases we shall be correct in this statement. 


3 J. Neyman (1937). ‘‘Outline of a theory of statistical estimation based on the classical 
theory of probability.’”’ Phil. Trans. Roy. Soc. A 236, pp. 333-380. 
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Now similarly we can find c, such that 


(6) P(% > ca) =a 


o 


and reversing this 


(7) P(o <=) = a. 
Ca 


This statement is not inconsistent with (5). It means that, if we choose to base 
our rule of estimation always on the range, then in 99% of cases we shall be 
correct in saying that ¢ < w/e.95. On the other hand, (5) relates to the conse- 
quences of applying always a rule of estimation based on the standard deviation 
of the sample. Both (5) and (7) are in themselves true statements, but we must 
decide which of them is the better one to use. In certain circumstances speed 
of calculation may be the determining factor, in which case (7) may be prefer- 
able, but here we shall assume that the time spent on calculation is not im- 
portant. 

In making the statement that o is less than some upper limit which is a func- 
tion of the sample observations, we shall, in general, prefer that this upper 
limit be placed as low as possible consistent with the chosen confidence co- 
efficient a. We find, however, that it is not possible to say that, whatever the 
sample obtained, s/b. will be less than w/c, or vice versa. We must, therefore, 
approach the problem from another angle. If o’ is a value greater than the true 
standard deviation o we can theoretically evaluate the probability that o’ < s/ba, 
and similarly the probability that o’ < w/c.. Wemay now express our general 
desire to place the upper confidence limit for o as low as possible in a more con- 
crete form. We may ask that the probability that o’ is less than this limit should 
be as small as possible. We find in the present problem that, whatever o’ > a, 
we should include o’ in the interval from 0 to s/b. less frequently than we should 
in an interval based on any other statistic. This constitutes an argument for 
using s rather than any other statistic such as w. 

In general, Neyman makes all problems of choosing between alternative 
procedures of interval estimation depend on the probability that the intervals 
include values of the parameter different from the true value, as well as on the 
probability of them containing the true value. This principle of choice does, 
I think, appear reasonable, although its application is not, of course, so straight- 
forward when statistics with properties of sufficiency similar to those of s do 
not exist. It is then necessary to introduce other conditions into the formula- 
tion of the problem. I intend to discuss elsewhere ways in which this has 
been done. 

To summarize, we may say: (a) we can make many true statements of the 
type (3); and (b) if we can agree on certain further properties which these state- 
ments should possess, we can choose which is the best statement of this type to 
adopt as our general rule for interval estimation. There are certain differences 
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between this approach and that of R. A. Fisher, whose attitude is expressed 
clearly in his contribution to the discussion following Neyman’s paper* “On the 
two different aspects of the representative method.” Fisher says there that: 
“In particular he would apply the fiducial method, or rather would claim unique 
validity for its results, only in those cases for which the problem of estimation 
proper had been completely solved, i.e. either when there existed a statistic of 
the kind called sufficient, which in itself contained the whole of the information 
supplied by the data, or when, though there was no sufficient statistic, yet the 
whole of the information could be utilized in the form of ancillary information.” 
Thus it appears that when sufficient statistics do not exist, excepting in those 
further cases where Fisher claims that the problem of estimation has been com- 
pletely solved, he would definitely discourage the use of the fiducial argument 
at all. Neyman, on the other hand, would allow the attempt to obtain interval 
estimates on the lines described above. Where sufficient statistics do exist, 
the two approaches do not lead to any final disagreement. Neyman, using 
results obtained in the Neyman-Pearson theory of testing hypotheses, is led to 
criteria depending in a particular way on the joint probability law of the sample, 
and these criteria are seen to involve the sample values only through statistics 
which have been defined as sufficient. One may regard this fact in two ways: 
(a) one may say that because acertain line of approach, which seems intuition- 
ally sound, leads to the use of statistics which have been defined as sufficient, 
therefore this definition of sufficiency is a good one, or (b) one may say that the 
definition of sufficient statistics is fundamental, and that any method of approach 
which leads to their use has thereby obtained some extra support. 


There remains the case alluded to above, where the joint probability law of 
the sample does not depend on the unknown parameter @ by way of one statistic 
only, but where nevertheless it has been said that the problem of estimation 
has been completely solved. This case will be discussed in the next section. 


2. Interval Estimates of Location. R. A. Fisher has given, as a particular 
example, a case where the unknown parameter is one of location, so that we can 
write 


p(x | 0) = o(x — 4). 


Now if we have a sample of n from this distribution, the (n — 1) differences 
between successive observations when arranged in order of magnitude will have 
a joint distribution independent of 6. Hence if we denote the sample by £, 
and the (n — 1) differences jointly by C, we have 


(8) p(E | #) = p(T |C, 4)p(C) 


where 7' is some statistic, such as the mean or median, whose distribution does 
depend on @ and may hence be taken as an estimate of 6. We may therefore 


4 J. Neyman (1934). J. R. Statist. Soc. 97, p. 617. 
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read (8) as follows: the joint probability law of the sample is equal to the proba- 
bility law of the estimate in samples of the same configuration, C, multiplied 
by the probability of the configuration, the latter not depending on the un- 
known 6. From this it has been deduced that all the information respecting 6 
provided by the sample is given by referring T to the distribution p(T | C, 6). 
Fisher,’ for instance, says that “in interpreting our estimate (we) may take as its 
sampling distribution that appropriate to only those samples which have the 
actual configuration observed.’’ Later in the same context he remarks that in 
general, when @ is a parameter of any type whatever, and not necessarily one of 
location or scaling, if something can be found ‘corresponding with the con- 
figuration of the sample in the simple case discussed above, ...one of the 
primary problems of uncertain inference will have reached its complete solution. 
If not, there must remain some further puzzles to unravel.” 

It is clear, therefore, that more has been claimed for this method than that it is 
practically useful, or that it yields the best results possible in large samples, or 
that it yields results highly approximating the best possible in small samples. 
There is an emphasis here on completeness that leads one to suppose that all 
problems of estimation and testing hypotheses may be answered to the best 
advantage by considering only the distribution of an estimate in samples of the 
same configuration, the estimate thus attaining, properties analogous to those 
of a sufficient statistic. That this supposition is not true may be seen by con- 
sidering the following simple example. This example concerns the simplest 
situation that one deals with in the theory of testing statistical hypotheses. 
Its relevance to the problem of interval estimation will, however, not be difficult 
to see. 

Suppose that we have a sample from a population involving only a parameter 
of location 6, and that we wish to test whether @ is equal to 4 (say), and that 
besides 6 there is only one value 6; (say) which it is possible for @ to take. Sup- 
pose we require to set up a statistical test which will reject the hypothesis 
6 = 4, in only a small proportion ¢ of cases, when it is true. Many such tests 
are possible, and it is natural to choose from them that test which will lead most 
frequently to the rejection of the hypothesis that 6 = 4 when the single alterna- 
tive 6 = 6, is true. Neyman and Pearson® have shown that the best test from 
this viewpoint is provided by the criterion 


(E | 6:) 

9) ja. 

( P(E | 9) 

This criterion must be referred to its distribution in all samples when 6 = @. 
We must therefore choose a constant J, such that 


(10) P(J > J.|0 = %) =e 


5 Fisher, R. A. (1936). ‘‘Uncertain Inference.’’ Proc. Amer. Acad. Arts and Sciences, 
71, No. 4, p. 257. 

6 J. Neyman and E. S. Pearson (1932). ‘On the problem of the most efficient tests of 
statistical hypotheses.’’ Phil. Trans. Roy. Soc. A 231, p. 300. 
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and reject the hypothesis that 6 = 6 when J > J.. This is known to be the 
best test in these circumstances, and we may demand that any other procedure 
which claims to use the data exhaustively should be equivalent toit. Now if we 
decide to use only the distribution of the statistic 7 in samples of the same 
configuration, we are led to take as the most powerful test based on T' | C one 
which would reject the hypothesis that @ = @) when the ratio of p(T | C, 61) to 
p(T | C, 6) exceeds a certain value. Now by (8) this ratio is exactly the criterion 
J of (9) above. There is, however, this difference, that J has now to be referred 
to its distribution in samples with the same configuration C as that observed. 
We shall therefore have to choose J.(C) such that 


(11) P(J > JAC) \C, 0} = «. 


A test, then, which rejects the hypothesis that 6 = 6 when J > J,(C) will 
be such that it is the most powerful possible with respect to the alternative 
6 = 6,, based on samples with the same configuration. However, in actual 
sampling from a population, we derive samples with all configurations, and the 
real power of the test will therefore be measured by 


(12) P{J > J(C) |} = [Pu > JAC) |C, #:}p(C) dC. 


This quantity cannot be greater, and will in general be less, than the power’ 
of the other test, viz. P(J > J. | 61). (If J.(C) is the same for all C, and there- 
fore equal to J, , the powers will be equal. This will be the case when there is a 
sufficient statistic for 6.) We must therefore conclude that, in relation to this 
simple problem at least, a method which takes account only of distributions in 
samples with the same configuration will not use the data to the best advantage. 

Of course the type of problem to be solved is usually not so straightforward 
as the present one. There will usually be more than one value of 6 alternative 
to 6 , and no uniformly most powerful test will, in general, exist. It is legiti- 
mate, however, to consider the above example, because any procedure claiming 
properties of sufficiency should be able to deal with it in the best possible way. 

An example may make the above points clearer, and will show their relevance 
to the problem of interval estimation. Consider a rectangular distribution 
with mean 6, and range from (@ — 3) to(@+ 3). Let x2, and x2 be a sample of 2 
from this population, and suppose we require confidence limits for 6 such that 
the chance of them enclosing @ is a. 

If we represent 2; and 22 as coérdinates of a point with respect to rectangular 
axes, the joint probability distribution is constant over a square centered at the 
point (@, #0). This is shown by ABCD in Fig. 1. We have 


$< < 0+ 3 


6 <_ 
(13) p(x, X2)dx,dx2 = dx, dz | 
(0-4 <m<0+}. 


7 Power is used throughout in the Neyman-Pearson sense, i.e. to denote the chance of a 
test rejecting a hypothesis when a given alternative is actually true. 
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If we write 2: = 3(11 + 22); 2 = 3(x1 — 22), 22 will represent the configuration 
of the sample, and z; may be taken as the estimate, 7’, of 6 in our discussion 
above. We can then show that 


(14) p(z1, 22)dz, dze = 2dz, dze, 


(15) p(ze)dz2 = 2{1 — 2|ze| }dzg--- —4 < 2 < }, 
and 


dz, 


(16) par | Ze) dz, = i—2|a| 


--O6—44+ |n| <a <6+ 3 —- lal. 

That these are the correct limits for z; and z, may be seen by reference to Fig. 1, 
noting that z; and z are constant along lines parallel to the respective diagonals 
BD and AC of the square. 


Fie. 2 


First let us confine ourselves to samples with the same configuration 22. 
Then, from (16), we can say that 


(17) Pid — a — lal) <a< 6+ a3 —|2\|)} =a. 


This statement is true for given ze , and will be a fortiorz true when this restriction 
is removed. It is equivalent to saying that the chance of a point falling into 
the shaded area in Fig. 1 is (1 — a), where a denotes the proportion of the 
diagonal AC lying in the non-shaded area.* Confidence limits for @ are then 
obtained by transposing (17), giving 


Pia. — aff — lal) <@<a+ta(Z—|z\|)} =a. 


FS 


: , an 
8 We are assuming that confidence limits are required such that the chance is ( - 


1 gaia 
of 6 being above the upper limit, and ( — 4) of it being below the lower limit. 
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That this is not the best way of constructing confidence limits is seen as follows. 
Let us denote the lesser of x; and 22 by x, , and the greater by x_. Then if we 
consider the possible values of x, and xg which will satisfy simultaneously the 


inequalities 
pane 
@—-4k<a <0+}- s/i-$ 


1 _«@ 
0 — + 4/4-%<20<04 


we see that they lie in the non-shaded area of the square ABCD in Fig. 2 where 


(19) 


the sides of the shaded squares are / Sw “. The chance of the inequalities 


2 2 
holding simultaneously is therefore a. Further we see that these inequalities 
can be transposed to read 


te—%<0<a+3 when (to — 21) > 4/3 —% 


Ta veglae 
+ WV 3 9 <O0<X%et+ 3 V 3 9 


when (%@ — 21) < W's of 


and therefore we can take these to define our confidence limits for 6. 

The intervals defined by the confidence limits in (18) and (20) are equivalent 
in the sense that each covers the true value of 6 in a proportion a of cases. To 
decide which is the better rule of interval estimation we shall follow Neyman, 
and consider how often the intervals cover values other than the true 6. In 


particular let (@ + A) be any other value, and consider the expressions P; and 
P, where 


(21) P, = P{za — a(} —|z\|) < (@+ A) <a + a(} — | 2|) 


and P; is the probability that one or another of the following inequalities holds 


ta —3<(6+A) <au+ 3 when (to — 21) > 4/5 -$ 


1 a . ae 

+ 4/3-$< 04a) <aet) W/'5 3 
when (x¢@ — a1) < 3-3 

2 2 


Now (21) can be written 


(23) =P; = P{(@+ A) — a(3 — |z|) <a < (0+ A) + a3 — |e )}. 
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Referring to Fig. 1 we see that we have to evaluate the chance of the sample 
falling into a lozenge-shaped area like the unshaded area in ABCD, but moved 
bodily along the diagonal AC to such a position as is indicated by the dotted 
lines. Difficulties are introduced by the discontinuities, but we can show that 
for A positive 


P; =a whenA=0 


Py 


1 a 
= a oe = 
P, 0 A G te :) 


with similar expressions for A negative. The graph of P; against A is shown 
in Fig. 3, a for convenience being taken = 0.92. From it we can read off the 
probability of the confidence interval covering (@ + A), where @ is the true value 
of the parameter. 

Similar calculations may be made for P,. Without going into details, it is 
seen that 


e a when A = 0 


l «a 1 a 
as a as - = < ee 
1 


Py (5 +3) - 2a +4" 


% -/3-% 
2° 3 27 3 
i 
ales - 4/3-3 


P2 is plotted against A in Fig. 3. It is seen that, whatever value of A we take, 
the chance of (@ + A) being included in the confidence interval, is less for the 
second method of estimation than it is for the method based on the distribution 
of 2: | 22.” This circumstance would, I think, contradict the view that the latter 
method was deriving the utmost from the sample. Whether the method is 
still a good one, though not necessarily the best, is not a question at issue in the 
present paper. The curves in Fig. 3 are very close together, and we are led to 
expect this by the fact that (12) is the weighted mean of the powers within the 
separate configurations, the weights being the probabilities p(C) of the con- 
figurations. I am only concerned to show that certain methods, for which 


9 It will be noted that, when inverted, the curves of Fig. (iii) represent the power func- 
tions of tests for which the regions of rejection are those in figures (i) and (ii) respectively, 
the test being whether the parameter has the specified value 6, and different alternative 
hypotheses being represented by (6 + A). 


~n —_ — ~* ~ — ee 
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properties analogous to those of sufficiency have been claimed, do not satisfy 
conditions which I think they should, if these claims are to be upheld. 


3. Fiducial Distributions. In the first section of this paper I discussed certain 
points of difference between the approaches to the problem of interval estimation 
made by R. A. Fisher on the one hand and J. Neyman and E. S. Pearson on the 
other. The differences are not, perhaps, of the same magnitude as those between 
all these writers and the protagonists of inverse probability, and the results 
reached are so often the same that the reader may be excused for being some- 
what impatient with what appear to be rather fine distinctions. However, 
as was seen in the last section, the approaches do not always yield exactly the 


Fic. 3 


same final results, and therefore I think it may be profitable to discuss them 
still further. 

Closely connected with Fisher’s desire to restrict the use of the fiducial method 
to situations where statistics exist which possess some property of sufficiency, 
is his introduction of the concept of a fiducial distribution for the unknown 
parameter. One can talk about the fiducial distribution for a parameter only 
if it is a unique distribution. Neyman, however, never makes use of fiducial 
distributions, and would, I think, claim that any valid results reached with the 
concept can equally well be reached without it. Where the results are the same 
there is room for two opinions on this matter. Some writers find it convenient 
to think in terms of fiducial distributions, and others prefer always to carry 
forward their reasoning as far as possible in terms of direct probability state- 
ments about the observational values, before transposing them to obtain con- 
fidence or fiducial limits for the parameters. 
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Greater objection can be made to the use of simultaneous fiducial distributions 
of several parameters. For instance, in the case of the normal distribution 
with parameters u and o, a simultaneous fiducial distribution has been defined 
in the following way.” Starting with the fact that the joint distribution of 


- 7 2 
o1 = Vn (é — ») and od: = dol 1)s 


o o 


f= — __! —j _¢ Ht git gh) 14 de, 
aerayr(* 5+) 


€ and s are treated formally as fixed, and ¢; and ¢2 are transformed to u and go, 
treated formally as variables. This gives 


(26) df= — 1 vn ‘ ee 2 = fa dude 
n—1 : 
aerayr("57) ° , 


o 
This distribution would be useful if it were legitimate to integrate it out to obtain 
a fiducial distribution for any function g(u, o) say, of »ando. However, as for 
instance Bartlett has pointed out, this is not necessarily permissible. It seems 
to me therefore, that distributions defined as in (26) should be dispensed with 
entirely, for their very form encourages the belief that they can be integrated 
out at will. That this belief is still held is illustrated by a recent paper by 
Miss D. M. Starkey” concerned with the difference between the means of normal 
populations where the standard deviations are not assumed equal. This is the 
original problem to which Fisher” applied a method equivalent to integrating 
out the joint fiducial distribution of the two population means. Bartlett” 
raised an objection to this method of treatment, and I have also discussed the 
matter further."* Miss Starkey proceeds from the assumption that Fisher’s 
method is sound. 

The concept of the fiducial distribution has also been used in those problems 
of location and scaling, which have been treated by the procedure discussed 
above, of considering distributions in samples with the same configuration. 
Indeed it is one of the attractions of this procedure that we are led to distribu- 


10 R. A. Fisher, (1935). ‘The fiducial argument in statistical inference.’’ Ann. Eugen. 
VI, p. 395. 

11 Daisy M. Starkey (1937). ‘‘A test of the significance of the difference between means 
of samples from two normal, populations without assuming equal variances.’”’ Ann. Math. 
Stat. Vol. IX. No. 3, pp. 201-213. 

122, R, A. Fisher (1935). loc. cit, 

13 M.S. Bartlett (1936). ‘‘The information available in small samples.’”’ Proc. Camb. 
Phil. Soc. 32, pp. 560-566. 

4B. L. Welch (1937). ‘The significance of the difference between two means when the 
population variances are unequal.’”’ Biometrika, XXIX, p. 358. 
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tions with, so to speak, one degree of freedom, so that the fiducial method may be 
safely applied. However, although probability statements based on such a 
fiducial method are here quite valid, I do not think that such statements can 
claim a unique validity. As I have shown in the previous section, there is no 
necessity to confine oneself to sampling within a configuration in order to obtain 
interval estimates for parameters, and we may fare better by not so confining 
ourselves, even if we have to dispense with the fiducial distribution. 


4. Summary. Certain points which arise in the problem of estimating an 
interval in which a population parameter should lie have been discussed. In 
the second section it has been shown that in estimating location parameters 
it is not sufficient to consider the distribution of estimates in samples of the 
same configuration, meaning by sufficient that the sample is thereby utilized 
in the best possible way. 


UNIVERSITY COLLEGE, 
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THE REGRESSION SYSTEMS OF TWO SUMS HAVING RANDOM 
ELEMENTS IN COMMON 


By J. F. Kenney 


1. Introduction. The purpose of this note is to illustrate the power and 
elegance of the technique of characteristic functions’ in solving a problem which 
has been discussed in the literature by Fischer” and others. 

Let 21, %2, --- , Xn be n variables independent of each other in the statistical 
sense, all subject to the same distribution function f, so that the function 
representing their joint distribution is 


(1) f(x1)f(a2) +--+ fen). 


Under these conditions a set of values x, 22, --- , X» will be said to constitute 
a sample of n from a population with distribution function f(x) and the function 
(1) will be said to represent the distribution of samples. It will be understood 
that f(x) is defined and is non-negative for all real values of x and 


 4(2) dz = 1. 


If the actual occurrence of the variable is limited to a finite range, f(x) is defined 
as identically zero outside that range. 

The mathematical expectation of an arbitrary function (x), denoted by 
application of the operator E, is 


(2) Elv(z)| = [ ” y(a)pla) ae. 


This integral will be convergent whenever y¥(x) is absolutely integrable and 
bounded. In particular, if (7) = x we have the mean 


a= [ af (x) dx 


and it will be assumed that a exists. 


Suppose a sample of n is taken from the population represented by f(x) and 
the sum 


(3) y=a+aet--- + at te t--s $+ 2n 


1The writer takes pleasure in acknowledging his indebtedness to Professor A. T. Craig 
for suggesting this method. 

2“‘On correlation surfaces of sums with a certain number of random elements in com- 
mon,’’ these ANNALS, vol. 4, no. 2, pp. 103-126. 
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isformed. From this sample k < n values are chosen at random, and a sample 
of m — k (m < n) additional values, az; , is taken from f(z). The sum 


(4) 2= t+ aet--s + eet tint ++) + 2m 


is then formed. The problem is to determine the regression systems of z on y 
and y on z in the population resulting from repeated samples. 

Before proceeding with the solution a brief discussion of characteristic func- 
tions will be given. 


2. Characteristic functions. When ¥(r) = e'”, where t is a real variable and 


i= Vv —1, (2) is called the characteristic function of x. Thus if we let g(t) = 
E(e‘”) we have 


g(t) = [ e'* f(x) dz. 


From the conditions imposed on f(x) it follows that the integral defining ¢(¢) 
is convergent and | ¢(t)| < 1. If the kth derivative of g(t) with respect to ¢ 
exists we have 


d* p(t) | 


dt® |\tm0 "ke 


VE = [ a" f(x) dx. 


Thus the characteristic function of x has the property that its kth derivative 


at the origin (divided by 2") gives the kth moment of the distribution of x about 
the origin of zx. 


The notion of characteristic function extends readily to a distribution of 


several variables. In particular, let F(y, z) be the joint distribution function 
of variables y and z subject to the condition 


[Cf Paaava = 


Then the characteristic function of F(y, z) is 
(5) g(t, te) = [ i eftuttt: Py, 2) dy dz 
where y and z are defined in (3) and (4). 
3. Solution of the problem. The distribution function associated with the 


population of samples is of the form given by (1). Consequently, the char- 
acteristic function of F(y, z) can be written in the form 


J+ {I es f(a) day I ei f(x;) 7 TL e'?*i f(x;) dx; 
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the limits of integration being taken over all admissible values of the variables. 


The above expression reduces to 


(6) g(t, te) = [o(t: + te)I*[o(t)]"“* [o()]” “. 


By the Fourier transform we have from (5), 


F(y, 2) = (1/2m)? | e *1¥-*27 O(t, | te) dt: dte. 


Since a distribution is completely determined by its characteristic function, 
F(y, z) can be exhibited if f(z) is known. However, the solution of the problem 
does not depend upon exhibiting F(y, z). 

Let g(y) and h(z) be the marginal distributions of y and z, respectively. 
Then the mean value of z for a fixed y is 


(7) _- / oY, £) 0 
g(y) 


and the mean value of y for a fixed z is 


_ __ | yFly, z) 
(8) n= | FODay 


where here and subsequently the integration is taken over all admissible values 
of the variables. 

Let us now take the partial derivative of g(t; , #2), as given in (5), with respect 
to t, and evaluate the result at tf = 0. We obtain 


. 
(9) = g(t, te) | = | i ize’” F(y, z) dy dz. 
Ole to=0 


If we denote the left member of (9) by G(t:) and utilize (7) in the right member, 
(9) becomes 


Gti) = | otadeniet™ ay, 
Application of the Fourier transform yields 
(10) ig(y)2y = = / e *"” Q(t) dt. 
Now from (6), 


G(ti) = ke(ty)”¢'(tr) + olts)"ta(m — k). 


Therefore (10) may be written as follows, 


‘ c ——§ 73 \n—-1 2 ae k —itiy yp \n 
(11) ig(y)2y = = / e840 ()" 19'(ty) dt: + me © | e 8" 0(8)" dh. 
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To evaluate these integrals, consider 


(12) e(t)” = | e'”G(y) dy. 


Differentiating (12) with respect to t; we have 


(13) nol)” ¢'(t) = / iye™%g(y) dy. 


Again using the Fourier transform, we obtain 


nk 


ivoto) = SE feet e'C) at 


from (13) and 


1 —it) ail \n 
g(y) = x |e "1¥ 5(t)) dt, 


from (12). Therefore (11) reduces to 


tyg(y)Zy, = = yal) + ia(m — k)g(y) 


and we have at once the simple result 

(14) Z, = ky/n + a(m — k). 
In an analogous manner, it may be shown that 

(15) 9. = kz/m + a(n — k). 

Writing (14) and (15) in the forms 


(Z, — am = ex(y — an) 
(16) 
9. — an = @2(z — am) 


where c; = k/n and c2 = k/m are the regression coefficients it follows from the 
linearity of the regressions that the correlation coefficient is 


p= Vee, = k//mn. 
If m = n, we have a well known result which is sometimes stated as follows: 


If y and z are affected by n equally likely causes of which k are common to. 
both, then the correlation coefficient between y and z is equal to k/n. 


NORTHWESTERN UNIVERSITY. 





A NOTE ON CONFIDENCE INTERVALS AND INVERSE PROBABILITY 
By ALBERT WERTHEIMER 


The object of this note is to discuss a certain property of confidence intervals 
from the point of view of inverse probability. We shall not go into detailed 
applications, but merely into fundamental ideas, so we shall work with distribu- 
tion functions that are continuous and satisfy conditions which are sufficient 
to insure the validity of the mathematical steps used. 

A clear and concise statement of the subject is given in a paper by Neyman,’ 
and we shall use it as the basis for our discussion. His presentation can be 
summarized as follows: Let x be a sample statistic having a distribution function 


mlrcn 
p(x, 6) 
i590 < & 


where @ is a parameter of the population. Now define two monotonic functions 


mircn 
t=f(0); 2x=g(0) 
ASA6SH 


such that f(@) < g(@), and 


g(6) 
(1) / p(x, 0)dx = 1 —e, for all 6. 
f(@) 


Let the prior distribution function of @ be 


¥(6) 0, <0 < bo. 


It then follows directly that the probability for any pair of values (2, @) lying 
within the region enclosed by the curves is given by 


82 g (6) 
(2) | y(6) dé [ p(x, 0)dzx = 1 —e. 
6, f (8) 


regardless of the prior function ¥(6). His conclusion then is this: Stating that 
(3) g(t) S<@0<f"() 


every time the observation gives us a value of x equal to that given in (3) we 
may in any one instance be wrong; this will happen only if the pair (z, @) for this 
observation lies outside the region enclosed by the curves; but from (2) the 
probability for this to happen is e. This statement is equivalent to saying that 


1 Journal of the Royal Statistical Society, Vol. 97, part IV, 1934; pp. 589-93, 
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if for every observed xz we write the inequality (3), then for a large number of 
samples, the fraction 1 — ¢ of the inequalities will be found correct. 

We note here that this is true only if in the inequality (3) zx is presumed to 
range over its entire interval of definition. But if for an observation z = 2’, 
we mean to consider the corresponding inequality 


(4) g(x’) <0<f'(z’) 


as one member of the class of inequalities that could be written just for those 
samples that had x = 2’, then we can not assert that the inequality (4) has a 
probability of 1 — eof being correct. In fact, any probability statement dealing 
with this class must involve the prior distribution function ¥(6); and if it is not 
given, then we do not know in what percent of cases the restricted inequality (4) 
will be found correct. . 

Let us nevertheless approach the problem from the viewpoint of inverse 
probability. Having observed x = 2’, the posterior probability of inequality 
(4) being correct is 


I-\2") ; 
‘ ¥(0)p(x’, 6) de 
(5) n(x’) = 2) -. 


| * W(6)p(2", 6) do 


the numerator being the probability for the simultaneous occurrence of 
x=n; g(r’) <o<f'(r’), 

and the denominator the probability’ that x = 2x’, 6 lying anywhere between 

6, and 4. 


As long as ¥(@) is unknown 7(z’) cannot be evaluated; however its average 
value 7(x) with respect to z can be evaluated. By definition of an average, 


(6) n(x) = [ n(z) az |" vOrt 6) dé 


1 


From (5) we have 


1(z) 


f—1\(x) 05 
(7 Jog Vote, odo = ate) | vopte, 0) ao 


Integrating both sides of (7) over the entire range of x we get 


ro f-1(2) ws 00 
[ az | ¥(0) p(x, 0) dé -[ n(x) ax | ¥(0) p(x, 0) dé 
1 g—1(z) Iz 1 


1 


= 9(z) 


? When we say probability that z = x’, we mean the probability that z will lie in the 
internal z + }dzx to within terms of order dz. 
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Interchanging the order of integration, as is permissible under the assumptions, 
we get 


92 g(@) 
q(x) = [ v(0) dé | p(x, 6) dx 
ay f (6) 


But since 


g(@) 
| p(x, 0)dx = 1—e, for all 6 
f (6) 


we finally get 
n(x) = 1l—e 


Thus when approached from the standpoint of inverse probability we see that 
the average value of the posterior probability of the inequality (4) is precisely 
the quantity 1 — « regardless of the prior distribution function (6). 

In conclusion it is a pleasure to thank Dr. Deming for the criticisms and 
suggestions which he has made in connection with this note. 


Navy DEPARTMENT, 
WASHINGTON. 





NOTE ON A MATCHING PROBLEM 


By SoLoMon KULLBACK 


1. Introduction. There is to be found in the literature [1] a number of dis- 
cussions of the matching problem i.e., the problem of deriving the distribution 
of the number of correct matchings when two sequences of elements are placed 
in correspondence. However, the formulation of the matching problem dis- 
cussed and illustrated herein is somewhat different from those problems already 
discussed in the literature [1], and may be of interest. A rather general state- 
ment of the problem follows. 


2. The Problem. Consider urns U;,7z = 1, 2, --- , n each of which contains 
some or all of the r different elements E,, E,,---,H,. The relative propor- 
tions of the r elements in the 7-th urn are pa, pi2, --- , Pir (@ = 1, 2,--- , n) 
such that 


(1) Pi + Piet +++ + Dir 
(2) pia + pie + +++ + pir = 


(Some pi; 7 = 1, 2,---,n,j = 1, 2, --- ,7 may be zero). 


Assuming each urn to be an infinite source, consider two sequences made by 
drawing, at random, a single element from each urn in turn. If the two se- 
quences are placed in correspondence there will be a number of correct match- 
ings. What is the distribution of the number of correct matchings if the fore- 
going process be indefinitely repeated? 


3. Solution of the Problem. The probability that the elements in the k-th 
position of the two sequences match may be derived by the following simple 
considerations. Since all the drawings are independent, the probability that 
both elements in the k-th position are Em is pim. Accordingly, the probability 
that both elements are the same, irrespective of their particular identity is 
Pia + pig +--+ + Dir = De. 

The theory for the number of correct matchings in this case thus corresponds 
to that for the Poisson series, which is well known [2]. For the special case in 
which p, = p, k = 1, 2, --- , n the distribution of the number of correct match- 
ings is in accordance with the binomial (q + p)” where gq = 1 — p. 


4. Numerical Illustration and Verification. The following illustration corre- 
sponds to the special case in which the urns are taken to be identical with equal 
proportions of each of the r elements. 
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Random sequences of 300 digits each were matched and the number of correct 
matchings recorded. The result of 457 such observations is given in Table 1. 


TABLE 1 


Observed distribution of number of correct matchings per sequences of 300 random 
digits each 


Observed frequency 


Number of correct | Observed frequency Number of correct 


matchings matchings 


18 35 
19 | 25 
20 | 15 
21 | | 

22 

23 

24 

25 

26 

27 

28 

29 

30 

31 45 


Total 


Average number of correct matchings Standard deviation 
29 .9934 4.8484 


TABLE 2 
Values of P, = (800!/x!(300 — x)!)(0.1)*(0.9)3-= 


rs x rs x rs x F's 


0.00033 23 .03240 32 0.06920 41 | 0.00875 
15 .00070 24 .04156 33 .06245 42 .00599 
16 .00139 25 .05099 34 .05499 43 .00400 
17 .00257 26 .05992 39 .04601 44 .00259 
18 .00449 27 .06756 36 .03763 45 .00164 
19 .00741 28 .07319 37 .02984 46 .00101 
20 .01156 29 .07628 38 .02294 47 .00061 
21 01713 30 .07656 39 .01713 48 .00036 
22 .02413 dl .07409 40 .01242 49 ~~ .00020 


In accordance with paragraph 3, the distribution in Table 1 should correspond 
to the binomial distribution with n = 300 and p = 10(1/10’) = 1/10. For the 
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TABLE 3 
Comparison of observed distribution with the theoretical distribution 
457 (0.9 + 0.1)3° 


Frequency 


Number of correct ae — 
matchings Creerved 


Theoretical 


f | F = 457f 

14-16 0.00242 _ 1 
17-19 | 01447 6 
20-22 .05282 24 
23-25 | 12495 57 
26-28 | 20067 91 
29-31 | | 22693 
32-34 | 18614 85 
35-37 | 11348 51 
38-40 | 05249 
41-43 | | 01874 
44-46 | | 00524 
47-49 | 





lnwrooenyeHar 





TABLE 4 


(Fo — F)?/F 


ses | 
a | 


2.87 


2.09 
20 
.08 
.23 
1 .20 
.50 n 


o> 
— 


4 


P(x? > x3) 


1 oO 
mMworoorwee 





binomial distribution we have m = np = 30, 6 = Wnpq = v/27 = 5.1962. 
To compare the observed distribution with the expected distribution we calcu- 
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lated the values of P, = (300!/x!(300 — «)!)(0.1)7(0.9)°"* for values of 
from 14 to 49 inclusive which are given in Table 2. 

To compare the observed and the theoretical distributions, and test the 
“Goodness of Fit,” the distributions were grouped in classes of three. The 
results are shown in Tables 3 and 4. 

5. Conclusion. The agreement between the observed distribution and the 
theoretical distribution derived on the basis of the argument in paragraph 3 
is quite satisfactory. 

We have shown herein, that if two sequences be matched under certain con- 
ditions, the distribution of the number of correct matchings will, in general, be 
that of a Poisson series and in special cases the binomial distribution. The 
theory was illustrated by an experiment which yielded results in satisfactory 
agreement with the theory. 
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REPORT OF THE ANNUAL MEETING OF THE INSTITUTE 


The fourth annual meeting of the Institute of Mathematical Statistics was 
held in Detroit, Michigan, on December 27-29, 1938, in conjunction with the 
meetings of the American Statistical Association and the Econometric Society. 
The program for the meetings was arranged by Professors 8. 8. Wilks and 
B. H. Camp. 

On Tuesday morning, December 27, the Institute held a session devoted to 
contributed papers with Professor B. H. Camp, president of the Institute in 
the chair. At that time the following papers were presented: 


. Generalizations of the Laplace-Liapounoff Theorem. 
W. G. Madow, Millbank Management Corporation, New York. 
. The standard errors of the geometric and harmonic means. 
Nilan Norris, Hunter College. 
. Note on an integral equation in population analysis. 
Alfred J. Lotka, Metropolitan Life Insurance Company, New York. 


. Optimum fiducial regions for simultaneous estimation of several population parameters 
from large samples. 
S. S. Wilks, Princeton University. 
. A mathematical contribution to immigration assessment. 
Churchill Eisenhart, University of Wisconsin. 
. Contributions to the theory of statistical estimation. 
A. Wald, Columbia University. 
. On the hypotheses underlying the applications of statistical methods to routine labora- 
tory analyses. 
J. Neyman, University of California. 
. Commodity transformations and matrices. 
Harold Hotelling, Columbia University. 


. Remarks on two methods of sample inspection. 
EK. G. Olds, Carnegie Institute of Technology. 


Abstracts of these papers are given at the close of this report. 


Immediately following the session just described, the Institute convened in 
business session. At that time President Camp announced that the newly 
elected officers for the year 1939 are: President, P. R. Rider, Washington 
University; Vice-Presidents, C. C. Craig, University of Michigan, and S. S. 
Wilks, Princeton University; Secretary-Treasurer, A. T. Craig, University 
of Lowa. 

The annual luncheon of the Institute was held at one o’clock on the same 
day. At the luncheon, Dr. Walter A. Shewhart, of the Bell Telephone Labora- 
tories addressed the Institute on “The Future of Statistics in Mass Produc- 
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tion.”” A summary of this address is included among the abstracts at the 
close of this report. 

On Wednesday morning, December 28, the Institute and the Statistical Asso- 
ciation held a joint session devoted to the teaching of Business Statistics. 
Professor T. H. Brown presided. The following papers constituted the program: 


1. The teaching of undergraduate students. 

L. S. Kellogg, Ohio State University. 
2. The teaching of graduate students. 

O. W. Blackett, University of Michigan. 
3. A bead-sampling machine for use in the class room. 

Dickson H. Leavens, Cowles Commission for Research in Economics. 
Discussion: Harry P. Hartkemeier, University of Missouri. 

Richard L. Kozelka, University of Minnesota. 


On the afternoon of the same day, the Biometric Section of the Statistical 
Association and the Institute presented the following program on Statistical 
Methods in Genetics Problems with Professor Lowell J. Reed as chairman: 


1. Tests of simple Mendelian inheritance in randomly collected data of one and two 
generations. 


Laurence H. Snyder and Charles W. Cotterman, Ohio State University. 
. Statistical studies of the familial aspects of cancer in humans. 
Herbert L. Lombard, Masachusetts State Department of Public Health. 


3. Application of the method of likelihood ratios to the testing of hypotheses of simple 
Mendelian inheritance. 
S. S. Wilks, Princeton University. 


. The application of statistical techniques to egg production data for the formulation of a 
breeding program. 


W. C. Thompson, New Jersey Agricultural Experiment Station. 


The Program Committees of the Institute and the Statistical Association 
arranged a joint session on Representative Sampling for Thursday afternoon, 
December 29. At that time the following papers were presented, with Professor 
Harold Hotelling presiding: 


1. On the mathematics of the representative method. 
Allen T. Craig, University of Iowa. 


. Application of the theory of sampling to large scale surveys and censuses. 
Frederick F. Stephan, American Statistical Association. 


. Further remarks on the mathematical aspects of representative sampling. 
J. Neyman, University of California. 
Discussion: Samuel A. Stouffer, University of Chicago. 
Churchill Eisenhart, University of Wisconsin. 
P. J. Rulon, Harvard University. 


The final session of the meetings was held on Thursday evening. This was 
a joint session with the Econometric Society and was devoted to Mathematical 
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Statistics in Economies. Professor Irving Fisher presided and the following 
papers were given: 


1. On the hypothesis of linearity of regression in economic research. 
J. Neyman, University of California. 

2. The selection of variates for use in prediction. 
Harold Hotelling, Columbia University. 

3 Decomposition of time series on the basis of non-correlation principle. 
Wassily Leontief, Harvard University. 


Discussion: William G. Madow, Millbank Management Corporation, New York. 
Gerhard Tintner, Jowa State College. 


A. T. Craic, Secretary. 











ABSTRACTS OF PAPERS 


(Presented on December 27, 1938, at the Detroit meeting of the Institute) 


Generalizations of the Laplace-Liapounoff Theorem. W.G. Mapow, Milbank 
Management Corporation, New York. 


The Laplace-Liapounoff Theorem states conditions under which a linear function of 
chance variables has a normal limiting distribution. 

In dealing with limiting distributions arising in the analysis of variance, regression 
analysis, etc., there occurred problems which required for their solution the derivation of 
the joint limiting distribution of several linear functions of chance variables and the joint 
limiting distribution of functions which were linear in one set of chance variables for fixed 
values of other sets of chance variables. 

These problems were solved by a matrix formulation of the Laplace-Liapounoff Theorem 
and by the introduction of a function whose convergence to zero in probability provided a 
sufficient condition for the existence of normal limiting distributions. 

Various generalizations with a view towards applications in multi-variate statistical 
analyses are discussed. The theorems provide a rigorous and complete basis for the deriva- 
tion of limiting distributions of quadratic and bilinear forms. 


The Standard Errors of the Geometric and Harmonic Means. Nitan Norris, 
Hunter College. 


Although certain properties of the geometric and harmonic means have been investigated 
extensively, there seems to have been no derivation of expressions for their variances in 
cases where they are used as estimates of parameters of parent populations. 

Application of the modern theory of estimation makes it possible to develop simple 
and useful formulae for the standard errors of these two averages for each of the respective 
general classes of cases in which they are most suitable. 

As in other instances in which standard errors are used in tests of significance, fiducial 
or confidence limits may be employed to overcome certain limitations of the outmoded 
practice of relying solely on multiples of either probable or standard errors to determine 
whether or not a result exists merely because of sampling fluctuations. 


Note on an Integral Equation in Population Analysis. Atrrep J. Lorka, Metro- 
politan Life Insurance Company, New York. 


In a population in which immigration and emigration are negligible, the number N(¢) 
of the population at time ¢ is connected with the annual births B(¢) and the probability p(a) 
of surviving from birth to age a, by the obvious relation 


(1) N(t) = | B(t — a)p(a) da. 
0 


If B(t) and p(a) are given, N(t) follows at once by direct integration. The inverse 
problem, given N(t), to find B(t), requires separate treatment. The case that N(¢) is 
given or can be expressed as a sum of exponential functions has been discussed by the 
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author on a former occasion. In the present communication it is shown how the function 
B(t) can be expressed as a series proceeding in ascending derivatives of N(t). 
A second solution is also offered in which 


, = b(t), 


the birth rate per head is obtained as a series, the first and dominating term of which is 


(2) b(t) = —=——— 


| e "t*»(a) da 
0 


where 7; is the rate of natural increase at time t. This development is of interest because 
it corresponds to the expression for b in a population with constant birth rate, death rate, 
and rate of natural increase; that is 


1 


| et* p(a) da 
0 


so that the new expression represents b(t) as the corresponding value of b in a Malthusian 
population, plus a series of correcting terms. 


(3) b= 


Optimum Fiducial Regions for Simultaneous Estimation of Several Population 
Parameters from Large Samples. S. S. Wixks, Princeton University. 


If a population has a distribution law f(z, 6) where z is the variate and 6 is a parameter, 
it is known (Annals of Mathematical Statistics, Vol. IX (1938) pp. 166-175) that under 
rather general conditions, confidence intervals, for a given confidence coefficient a, which 
are shortest on the average, can be obtained from large samples of n items by solving the 
equations 


(1) 5 = 44, 





+d 
. 1 . 7 ; 
for 0, where d, is the normal deviate given by : [ et? dt=a. Listhe logarithm 


2x J-4, 
n 


of the likelihood, i.e. L = > log f(x: , 0), where EF denotes mean value with respect to 


s=1 
the probability law f(z, @). 

The present paper is an extension of the foregoing results to the case of several parame- 
ters. It is shown under fairly general conditions that if the distribution law of z if a 
function f(z, 61, --- , 6.) depending on h parameters, then for a confidence coefficient a 
the fiducial region of the @’s which is smallest in size on the average is given by the region 
in the space of the 6’s for which 


h 
1 aL OL 
2 ~ F esl —N—)< xt 
( ) nm ij=l _ (22) (22) Ss Xe 
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where x? is such that P(x? < x2) = a, where the probability is calculated from a x? distri- 
bution with h degrees of freedom. The matrix || a;; || is the inverse of the matrix whose 


general element is 
gi el te! 
00; 00; 


Similar results hold when f is a function of several random variables as well as the parame- 
ters 01, 02, °°:,%. 


A Mathematical Contribution to Immigration Assessment. CHURCHILL EISEN- 
HART, University of Wisconsin. 


A certain problem is assessing the size of an immigration can be stated mathematically 
as follows: A sample of size N is drawn at random from a population in which the proba- 
bility of Aisp. Let ‘‘not-A’’ be denoted by B. Then the sample will contain a frequency, 
say x, of Aand N — zof B, x being arandom variable. This sample is now mixed together 
with a very much larger sample in which the elements are all B’s, and the B’s belonging 
to the original sample lost sight of. The problem is to estimate N from the observed 
frequency of A, namely z, in the composite sample, p being assumed known. The maxi- 
mum likelihood estimate of Nisxz/p. For large values of z, and a fortiori of N, confidence 
intervals for N take the form Ni < N < N2 where 


N, = (Vat + 4o(e — 4) — at}? 
4pq 


N, = (V ge + dale + ») + at}? 
4pq 
.= P; 
and the confidence coefficient is .95 if t is set equal to 1.96, and is .99 if t is set equal to 2.58. 


For small values of x the solution is more difficult but charts are being prepared from 
which the confidence intervals can be read off. 


A Contribution to the Theory of Statistical Estimation. A. Wap, Columbia 
University. 


Let us denote by f(z, 6) a probability density function, where @ is a parameter. Denote 
by © the set of all possible values of 6. The assumption that 6 belongs to a subset w of 2 
is called a hypothesis. Let us consider a system S of subsets of 2. Denote the hypothesis 
corresponding to an element w of S by H, and the system of all hypotheses corresponding 
to the elements of S by Hs. Denote by £ a sample point in the n-dimensional sample 
space drawn from a population with the probability density function f(z, 6), where the 
value of 6 is unknown. We have to decide by means of the sample point E which hy- 
pothesis of the system Hs should be accepted. That is to say, for each hypothesis H, 
we have to choose a region of acceptance M, in the sample space. The hypothesis H, 
will be accepted if and only if the sample point E falls in the region M,,. Denote by Mg 
the system of all regions M,. The statistical problem to be solved is the question of 
how the system of regions Mg should be chosen? 

In order to answer this question, a non-negative weight function w(@, w) is introduced, 
which is defined for all values 6 and for all elements w of S. The weight w(@, w) expresses 
the loss caused by accepting H,, if 6 is true. The probability of accepting H. multiplied 
by the weight w(6, w) is called the risk of accepting H, if 9 is true. Denote this risk by 
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r(@, H. , Ms) (the risk depends obviously also on the system of regions Ms). The total 
risk of accepting a false hypothesis if 6 is true, is given by 


r(6, Ms) = >> r(@, Ha, Ms) 
where the summation is to be taken over all elements w of S which do not contain 0. 
Denote by r(Ms3s) the maximum of r(@, Ms) with respect to @6. The system Ms of regions 
for which r(Ms) becomes a minimum is called the ‘‘best’’ system of regions relative to the 
weight function w(@, w). Some properties of the best system of regions have been studied 
and the problem of its calculation has been treated. 


On the Hypotheses underlying the Applications of Statistical Methods to 
Routine Laboratory Analyses. J. NeyMAN, University of California. 


The problem considered is that of estimating the proportion, p, of certain designated 
elements of the population sampled, the estimate to be based on a random sample of n, 
drawn by some mechanical device, such as is sometimes used in industry and in laboratory 
work. Examples: (1) to estimate the proportion of defective manufactured products in 
mass production; (2) to estimate the proportion of seeds which are able to germinate in 
given conditions. One would expect that the sample proportion, say q, will be distri- 
buted in repeated samples according to the Binomial Law and that, consequently, in order 
to obtain the confidence limits for p, one should use the Clopper-Pearson graphs. How- 
ever, the evidence obtained from special analyses on seed germination, made in the Seed 
Testing Station of Warsaw, Poland, shows that this assumption may not be true. The 
sampling there was carried out by means of a machine and involved a certain amount of 
mixing. As a result g was more stable than it was expected. It did not follow the Bi- 
nomial Law at all, but a Normal one about p, with a standard deviation, o, which could 
be well estimated from the sum of squares of deviations from respective means. For a 
considerable period of time (18 months) o retained a constant value (a characteristic of 
the action of the sampling machine) which was rather smaller than (n~'q(1 — q))}. 

Consequently, to have a preassigned frequency of correct statements concerning p, 
it was necessary to calculate the confidence intervals according to the formulae of the 
Normal Theory 


q—te<p<qttc 


with an appropriate value of ¢. Probably similar situations are rather common. 


Remarks on Two Methods of Sampling Inspection. E.G. Oups, Carnegie 
Institute of Technology. 


When the instructions for inspecting lots of size m specify that samples of size n be 
taken and the lot be passed without detailed inspection if no defectives are found, then, 
on the average, the maximum number of defectives are passed when the number of defects 
m+l1 m+1 

or — I. 
n+l n+i1 


If the quality of a lot is to be checked by drawing pieces until a fixed number of defective 
pieces are found, it is important to know that the expected number n; necessary to obtain 1 


per lot is 


1 Ni ‘ ‘ 
defects isi ae , where there are p defects in the lot. If ————-. is used as an estimate 
p+ i(m + 1) 


n 


- im+)- 


1 . 
of — 7 it is convenient to observe that the variance of 


sel ipmeitcce. me iinet in ties 
p+t2Apt1l m+illi pti 
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Commodity Transformations and Matrices. Harotp Hore.iine, Columbia 
University. 


If we regard the prices and quantities of nm commodities as vectors we may apply the 
theory of linear transformations in various ways having economic and statistical signifi- 
cance. An example is the mixing of grains to produce results conforming to new specifica- 
tions, as in international trade. Another kind of example arises in problems of multi- 
variate statistical analysis such as those treated in my paper on ‘‘Relations between Two 
Sets of Variates’’ (Biometrika, 1936), concerned with properties invariant under internal 
linear transformations of the variates of each set. Prices transform contragrediently to 
quantities in all cases. Hence, if the prices and quantities of the same set of commodities 
are the two sets of variates, the allowable transformations are restricted. Consequently 
there are invariants in this case additional to those discussed in the paper mentioned. 
Another problem is the reduction of sets of linear demand functions to normal form and 
determination of invariants when transformations of prices must be contragredient to 
those of commodities. The question whether the demand functions are symmetrical is 
here of paramount importance, since symmetry is preserved by such transformations, 
and since there are known theoretical reasons to expect symmetry. For a non-singular 
set of linear symmetrical demand or supply functions there are no invariants under arbi- 
trary sets of contragredient transformations; but for pairs of such sets of demand and 
supply functions there are invariants, namely the elementary divisors of the pair of ma- 
trices of coefficients. A set of demand or supply functions alone has invariants if its 
matrix Bis not symmetrical. If B’ denote the transverse or conjugate of B, the elementary 
divisors of B + dB’ are such invariants. 


The Future of Statistics in Mass Production.!' Water A. SHEWHART, Bell 
Telephone Laboratories, New York. 


Much has been written about the application of statistical theory and technique in 


studying, discovering, and measuring the effects of an existing system of unknown or 
chance causes. Much remains to be written about the application of statistical theory 
and technique in finding out how to tinker with and modify an existing chance cause 
system until it behaves as we would have it do. In research, we use statistical theory in 
helping to predict the future effects of some existing cause system. The statistician knows 
that his predictions will be valid if certain assumptions about the cause system are justi- 
fied. Perhaps the most important assumption of this type is that the particular effects 
of a chance cause system under study are random. In mass production, however, the 
statistician has learned by experience that chance cause systems producing random effects 
don’t just happen even under what we customarily consider to be the best regulated 
laboratory conditions. If the industrial statistician chooses to ignore this fact and makes 
predictions as if he were dealing with random cause systems, he may expect many of his 
predictions to fall far short of the truth: what is more, he knows that this fact will be 
discovered and his work discredited because in a continuing mass production process 
predictions are sure to be checked. Hence the industrial statistician in mass production 
must start not where the research statistician leaves off but, as it were, before the research 
statistician begins: that is, he must start by developing techniques for determining when 
we are justified in assuming that the underlying cause system is random. We thus arrive 
at a good starting puint from which to consider the future of statistics in mass production. 

Experience in the control of quality has provided a practical technique for detecting 


1Summary of an address delivered at a luncheon meeting of the Institute of Mathe- 
matical Statistics, Detroit, Michigan, December 27, 1938. 
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and eliminating assignable causes of variability in the production process until a state of 
statistical control is reached where predictions based upon the assumption of randomness: 
are likely to prove valid. It has also been shown elsewhere that by elimination of assign- 
able causes of variability, we may make the most efficient use of raw materials, maximize 
the assurance of maintaining standard quality of manufactured goods, minimize the cost 
of inspection, and minimize the cost of rejections. Hence we may conclude that the use 
of statistics in mass production can be made to pay good dividends: such use can be made 
to have a bright future. On what then does that future depend? 

To answer this question, we must consider the following three fundamental steps in the 
process of mass production: 

I. The specification of the quality of the thing wanted. 

II. The production of things designed to meet the specification. 
III. The inspection of the things produced to see if they meet the specification. 
An outstanding characteristic of the first step, specification, is the necessity of setting up 
and living within what we term a tolerance range? for each specified quality characteristic. 
If a producer contracts to live within some specified range and in taking steps II and III 
fails to do so, he usually loses a lot of money. Hence he must know how to set tolerance 
limits that he can meet. Moreover, if he is to be able to make the most efficient use of 
materials in many instances, he must close up as much as he feasibly can on the specified 
tolerance range. 

Obviously, however, one can not specify a practically attainable tolerance range out of 
thin air: one must be limited by what it is possible to do under commercial conditions of 
production in step II and this in turn is revealed by inspection in step III. We must also 
take into account the fact previously noted that any manufacturing process to begin with 
is almost certain not to be in a state of statistical control. In fact, this state can only be 
approached through the application of certain statistical techniques that have been found 
useful in detecting the presence of assignable causes that can be found and removed. A 
point to be stressed is that the three steps—specification, production, and inspection—in 
mass production, cannot be taken independently: instead, they must be coérdinated. It 
also may be shown that maximum effectiveness in the use of statistical theory can only 
be attained by coérdinating the applications in each of the three steps. It is significant 
to note that in order to attain the most efficient use of materials and processes by mini- 
mizing the tolerance range and in order to minimize the cost of production, one must 
make effective use of the results obtained in the course of commercial production, par- 
ticularly those in the third step, inspection. In fact, the three steps might be thought of 
as constituting a scientific experiment in which the objective is the attainment of the 
most efficient use of available materials in the production of manufactured goods. 

Broadly speaking, the statistician of the future has before him the opportunity of 
helping to develop this fundamental type of experiment in many respects like the way he 
is successfully helping today in so many fields of research to design experimental procedures 
that make the most efficient use of human effort. Certain differences, of course, exist. 
For example, as already noted, he must start by designing a statistical control technique 
for randomizing, as it were, the cause system through the elimination of assignable causes. 
Then he can use modern statistical techniques of research in much the same way described 
in the literature with reasonable assurance that resulting predictions will be found valid 
because he has first randomized his cause system. He must, however, go farther than 
indicated in the current literature of statistical research in that he must provide opera- 
tionally verifiable meanings for statistical terms such as random variable, accuracy, pre- 


2 The tolerance range is not to be confused with the fiducial range of modern statistics. 
The distinction between the two is set forth at some length in a forthcoming publication, 
Statistical Method from the Viewpoint of Quality Control, to be published shortly by the 
Graduate School of the United States Department of Agriculture. 
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cision, true value, probability, degree of rational belief, and the like.* This is particu- 
larly necessary in the steps of specification and inspection because the specification is 
often made the basis of a contractual agreement between producers and consumers. 

There is a sense in which the statistician’s problem in helping to develop the mass 
production process so as to make the most effective use of information yielded by the 
process is much more complicated than the design of experiment usually considered in 
the literature of statistics. Whereas the customary statistical theory of design of experi- 
ment in research is concerned with comparatively small-scale experiments carried out 
under controlled conditions of the laboratory by a few people, the corresponding develop- 
ment of the mass production process must be carried out under commercial conditions on a 
large scale involving large numbers of people. For example, the three steps in the mass 
production process are usually carried out either by different companies or by different 
departments of the same company. The steps may involve the coérdinated effort of 
literally hundreds and even thousands of employees, including physicists, chemists, en- 
gineers, sales agents, purchasing agents, lawyers, and economists. Very few of these 
people have ever had any training in statistics or probability and yet many of them must 
be sold on the use of statistics if the statistician is to have an opportunity of making 
his full contribution to the social and economic effectiveness of the mass production 
process. This situation constitutes a problem not only for those now in industry but 
also for those responsible for training the industrial leaders of tomorrow so that they 
will have sufficient knowledge of statistics to help them recognize the potential contribu- 
tions of statistical theory and technique. 

In conclusion, then, we may say that in the future the statistician in mass production 
must do more than simply study, discover, and measure the effects of existing chance 
cause systems: he must devise means for modifying these cause systems in the best way 
to satisfy human wants. The statistician in mass production must not be satisfied with 
simply measuring demand for goods; he must help change that demand by showing, among 
other things, how to close up the tolerance range and improve the quality of goods. He 
must not be content with measuring production costs; he must help decrease production 
costs through the use of the techniques of statistical control. 

The future contribution of statistics in mass production lies not so much in solving 
the problems usually put to the statistician today by those not statistically trained as 
in taking a hand in helping to coérdinate the steps of specification, production, and inspec- 
tion considered as a scientific experiment for making the most efficient use of human 
effort in the production of goods to satisfy human wants. The long range contribution 
of statistics depends perhaps not so much upon getting a lot of highly trained statisticians 
into industry in the immediate future as it does in creating a statistically minded new 
generation of those physicists, chemists, engineers, and others who will in any way have a 
hand in developing and directing the mass production process of tomorrow. 


3 An initial step in this direction has been taken in my Washington lectures. Loc. cit. 
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OF THE 
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ARTICLE I 
NAME AND PURPOSE 


1. This organization shall be known as the Institute of Mathematical Statistics. 
2. Its object shall be to promote the interests of mathematical statistics. 


ARTICLE II 
MEMBERSHIP 


1. The membership of the Institute shall consist of Members, Fellows, Honorary 
Members, and Sustaining Members. 

2. Voting members of the Institute shall be (a) the Fellows, and (b) all others who 
have been members for twenty-three months prior to the date of voting. 


ARTICLE III 


Orricers, Boarp oF DrrREcTORS, COMMITTEE ON MEMBERSHIP, AND COMMITTEE ON 
PUBLICATIONS 


1. The Officers of the Institute shall be a President, two Vice-Presidents, and a Secre- 
tary-Treasurer, elected for a term of one year by a majority ballot at the annual meeting 
of the Institute. Voting may be in person or by mail. 

(a) Exception. The first group of Officers shall be elected by a majority vote of the 
individuals present at the organization meeting, and shall serve until December 31, 1936. 

2. The Board of Directors of the Institute shall consist of the Officers and the previous 
President. 

3. The Institute shall have a Committee on Membership composed of three Fellows. 
At their first meeting subsequent to the adoption of this Constitution, the Board of 
Directors shall elect three members as Fellows to serve as the Committee on Membership, 
one member of the Committee for a term of one year, another for a term of two years, 
and another for a term of three years. Thereafter the Board of Directors shall elect 
from among the Fellows one member annually at their first meeting after their election 
for a term of three years. The president shall designate one of the Vice-Presidents as 
Chairman of this Committee. 

4, The Institute shall have a Committee on Publications composed of three Members 
or Fellows elected by the Board of Directors. The President shall designate a Vice- 
President as Ex Officio Chairman of this Committee. 


ARTICLE IV 


MEETINGS 


1. A meeting for the presentation and discussion of papers, for the election of Officers, 
and for the transaction of other business of the Institute shall be held annually at such 
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time as the Board of Directors may designate. Additional meetings may be called from 
time to time by the Board of Directors and shall be called at any time by the President 
upon written request from ten Fellows. Notice of the time and place of meeting shall be 
given to the membership by the Secretary-Treasurer at least thirty days prior to the 
date set for the meeting. All meetings except executive sessions shall be open to the 
public. Only papers accepted by a Program Committee appointed by the President may 
be presented to the Institute. 

2. The Board of Directors shall hold a meeting immediately after their election and 
again immediately before the expiration of their term. Other meetings of the Board 
may be held from time to time at the call of the President or any two members of the 
Board. Notice of each meeting of the Board, other than the two regular meetings, 
together with a statement of the business to be brought before the meeting, must be given 
to the members of the Board by the Secretary-Treasurer at least five days prior to the 
date set therefor. Should other business be passed upon, any member of the Board shall 
have the right to reopen the question at the next meeting. 

3. The Committee on Membership shall hold a meeting immediately after the annual 
meeting of the Institute. Further meetings of the Committee may be held from time to 
time at the call of the Chairman or any member of the Committee provided notice of such 
call and the purpose of the meeting is given to the members of the Committee by the 
Secretary-Treasurer at least five days before the date set therefor. Should other business 
be passed upon, any member of the Committee shall have the right to reopen the question 
at the next meeting. 

4. At a regularly convened meeting of the Board of Directors, three members shall 
constitute a quorum. At a regularly convened meeting of the Committee on Member- 
ship, two members shall constitute a quorum. 


ARTICLE V 


PUBLICATIONS 


1. The Annals of Mathematical Statistics shall be the Official Journal for the Institute. 
Other publications may be originated by the Board of Directors as occasion arises. 





ARTICLE VI 
EXPULSION OR SUSPENSION 


1. Except for non-payment of dues, no one shall be expelled or suspended except by 
action of the Board of Directors with not more than one negative vote. 





- ARTICLE VII 
AMENDMENTS 


1. This constitution may be amended by an affirmative two-thirds vote at any regularly 
convened meeting of the Institute provided notice of such proposed amendment shall 
have been sent to each voting member by the Secretary-Treasurer at least thirty days 
before the date of the meeting at which the proposal is to be acted upon. Voting may be 
in person or by mail. 











BY-LAWS 


BY-LAWS 
ARTICLE I 


DvuTIES OF THE Orricers, Boarp oF DrrEcTors, COMMITTEE ON MEMBERSHIP, AND 
COMMITTEE ON PUBLICATIONS 


1. The President, or in his absence, one of the Vice-Presidents, or in the absence of the 
President and both Vice-Presidents, a Fellow selected by vote of the Fellows present, 
shall preside at the meetings of the Institute and of the Board of Directors. At meetings 
of the Institute, the presiding officer shall vote only in the case of a tie, but at meetings of 
the Board of Directors he may vote in all cases. At least three months before the date of 
the annual meeting, the President shall appoint a Nominating Committee of three 
members. It shall be the duty of the Nominating Committee to make nominations for 
Officers to be elected at the annual meeting and the Secretary-Treasurer shall notify all 
voting members at least thirty days before the annual meeting. Additional nominations 
may be submitted in writing, if signed by at least ten Fellows of the Institute, up to the 
time of the meeting. 

2. The Secretary-Treasurer shall keep a full and accurate record of the proceedings 
at the meetings of the Institute and of the Board of Directors, send out calls for said 
meetings and, with the approval of the President and the Board, carry on the corre- 
spondence of the Institute. Subject to the direction of the Board, he shall have charge 
of the archives and other tangible and intangible property of the Institute. He shall 
send out calls for annual dues and acknowledge receipt of same; pay all bills approved 
by the President for expenditures authorized by the Board or the Institute; keep a 
detailed account of all receipts and expenditures, prepare a financial statement at the 
end of each year and present an abstract of the same at the annual meeting of the Institute 
after it has been audited by a Member or Fellow of the Institute appointed by the 
President as Auditor. The Auditor shall report to the President. 

3. The Board of Directors shall have charge of the funds and of the affairs of the 
Institute, with the exception of those affairs specifically assigned to the President or to 
the Committee on Membership. The Board shall have authority to fill all vacancies 
ad interim, occurring among the Officers, Board of Directors, or in any of the Committees. 
The Board may appoint such other committees as may be required from time to time to 
carry on the affairs of the Institute. 

4. The Committee on Membership shall prepare and make available through the 
Secretary-Treasurer an announcement indicating the qualifications requisite for the 
different grades of membership. 

5. The Committee on Publications, under the general supervision of the Board of 
Directors, shall have charge of all matters connected with the publications of the Institute, 
and of all books, pamphlets, manuscripts and other literary or scientific material collected 
by the Institute. Once a year this Committee shall cause to be printed in the Official 


Journal the Constitution and By-Laws and a classified list of all the Members and Fellows 
of the Institute. 


ARTICLE II 
DvEs 


1. Members shall pay five dollars at the time of admission to membership and shall 
receive the full current volume of the Official Journal. Thereafter, Members shall pay 
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five dollars annual dues. The annual dues of Fellows shall be five dollars. The annual 
dues of Sustaining Members shall be fifty dollars. Honorary Members shall be exempt 
from all dues. 

2. Annual dues shall be payable on the first day of January of each year. 

3. The annual dues of a Fellow or Member include a subscription to the Official 
Journal. The annual dues of a Sustaining Member include two subscriptions to the 
Official Journal. 

4. It shall be the duty of the Secretary-Treasurer to notify by mail anyone whose dues 
may be six months in arrears, and to accompany such notice by a copy of this Article. 
If such person fail to pay such dues within three months from the date of mailing such 
notice, the Secretary-Treasurer snall report the delinquent one to the Board of Directors, 
by whom the person’s name may be stricken from the rolls and all privileges of member- 
ship withdrawn. Such person may, however, be re-instated by the Board of Directors 
upon payment of the arrears of dues. 


ARTICLE III 


SALARIES 


1. The Institute shall not pay a salary to any Officer, Director, or member of any 
committee. 


ARTICLE IV 


AMENDMENTS 


1. These By-Laws may be amended in the same manner as the Constitution or by a 
majority vote at any regularly convened meeting of the Institute, if the proposed amend- 
ment has been previously approved by the Board of Directors. 
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